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Source Coding for Multiple Descriptions 
lil: A Binary Source 


By H. S. WITSENHAUSEN and A. D. WYNER 
(Manuscript received March 19, 1981) 


A uniformly distributed (iid) binary source is encoded into two 
binary data streams at rates R; and R2, respectively. These sequences 
are such that by observing either one separately, a decoder can 
recover a good approximation of the source (at average error rates 
D,, D2, respectively), and by observing both sequences, a decoder can 
obtain a better approximation of the source (at average error rate 
D,). In this paper a “converse” theorem is established on the set of 
achievable quintuples (Ri, R2, Do, D:, D2). For the special case R, = 
R. = 1/2, Do = 0, and D, = Dz = D, our result implies that D = 1/5. 


l. INTRODUCTION 


Let {X;}%-1 be a sequence of independent drawings of the binary 
random variable X, where Pr{X = 0} = Pr{X = 1} = 1/2. Assume that 
this sequence appears at a rate of 1 symbol per second as the output 
of a data source. (Refer to Fig. 1.) An encoder observes this sequence 
and emits two binary sequences at rates Ri, Re s 1. These sequences 
are such that by observing either one, a decoder can recover a good 
approximation to the source output, and by observing both sequences, 
a decoder can obtain a better approximation to the source output. 
Letting D,, Dz, and Do be the error rates which result when the streams 
at rate R,, rate Re, and both streams are used by a decoder, respec- 
tively, our problem is to determine (in the usual Shannon sense) the 
set of achievable quintuples (Ri, R2, Do, Di, D2). Our main result is a 
“converse” theorem which gives a necessary condition on the achiev- 
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able quintuples. This paper extends a previous one on the same sub- 
ject.’ This paper, however, is self-contained. 

This problem is an idealization of the situation in which it is desired 
to 

(t) send information over two separate channels, as in a packet 
communication network, and 

(ii) recover as much of the original information as possible, should 
one of the channels break down. 

To fix ideas, let us say that R; = Re = 1/2, Do = 0, and D, = D2 = D. 
Thus, the source sequence at rate 1 is to be encoded into two sequences 
of rate 1/2 each, such that the original sequence can be recovered from 
these two encoded sequences with approximately zero error rate (i.e. 
Do = 0). Our question then becomes: How well can we reconstruct the 
source sequence from one of the encoded streams—that is, what is the 
minimum D? A simple-minded approach would be to let the encoded 
streams consist of alternate source symbols, which will allow Do = 0. 
In this case, D = 1/4, since by observing every other source symbol a 
decoder will make an error half the time on the missing symbol. Is it 
possible to do better? El Gamal and Cover” have looked at this problem 
and have a theorem which can be used to show that we can make 
D = (V2 — 1)/2 = 0.207. In a previous paper’ it was shown that (with 
R, = R2 = 1/2, Do = 0) D= 1/6. The new result given here specializes 
to D = 1/5 = 0.200. The exact determination of the best D remains 
an open problem.* 


ll. FORMAL STATEMENT OF PROBLEM AND RESULTS 


Let B = {0, 1}, and let dz(x, y), x, y ¢ B’, be the Hamming distance 
between the binary N-vectors x, y; i.e., dy(x, y) is the number of 
positions in which x and y do not agree. A code with parameters 


*In Ref. 3, Witsenhausen proved a closely related result which encourages the 
conjecture that D = 0.207 is, in fact, the best possible. 
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(N, Mi, Me, Do, D1, D2) is a quintuple of mappings (fi, fo, Zo, 21, £2) 
where, 


fe: BY > (1, +++, Mi}, a= 1, 2 (1a) 
Za: (1, 2,°+-, Ma} > BY, a=1,2 (1b) 
Bo: {1, 2, --», My} X (1, 2,---, Ma} > B”. (1c) 


The source output is a random vector X uniformly distributed on B”. 
Define 


Y = g1°fi(X), (2a) 
Z = g2°f2(X), (2b) 
and 
X = gol A(X), A(X)]. (2c) 
Then the average error rates are 
1 
D, = N Edr(X, Y), (3a) 
Dz =~ Edu(X, 2), (3b) 
N 
Do = = Edy(X, X). (3c) 


We say that a quintuple (Ri, Re, do, di, d2) is achievable if, for 
arbitrary « > 0, there exists, for N sufficiently large, a code with 
parameters (N, Mi, Ms, Do, D1, D2), where M, = 2'"**%, a = 1, 2, and 
D, = d. + €, a = 0, 1, 2. The relationship of this formalism to the 
system of Fig. 1 should be clear. Our problem is the determination of 
the set of achievable quintuples, and our main result is a converse 
theorem. 

Before stating our result, let us take a moment to state a positive 
theorem by El Gamal and Cover’ as it specializes to our problem. 
Theorem 1: The quintuple (Ri, R2, do, di, dz) is achievable if there 
exists a quadruple of random variables X, X, Y, Z, which take values 
in B, such that Pr{X = 0} = Pr{X = 1} = 1/2, and 


Edu(X, X) < do, (4a) 
Edu(X, Y) = di, (4b) 
Edzx(X, Z) < do, (4c) 
and 
R, =1(X; Y), (5a) 
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R2 = 1(X;Z), (5b) 
Ri + Ro =1(X;X, Y,Z) + 1(Y;Z), (5c) 


where I(-;-) is the usual Shannon information. 

For the special case of R; = R2 = 1/2, do = 0, it can be shown that 
d, =d,= (v2 — 1)/2 = 0.207 is achievable. 

We now state our converse result. 
Theorem 2: If (Ri, Re, do, di, dz) is achievable, then in all cases 


R, + R2=1-h(d), (6a) 
furthermore, if 2d, + dz = 1, then 








2 
Ry + Rez 2— do) A (2d + de - 2E), (6b) 
— 2 
and if d, + 2d2 = 1, then 
2d3 
Ri + Re =2—h(d) —h d; + 2d2—7 mae (6c) 
— 1 
where 
0, \=0, 
h(A) = 4 —AlogeA — (1—A)logo(1—A), O<AS1/2. 
1: A= 1/2. 


All logarithms in this paper are taken to the base 2. As (6a) is obvious, 
and (6c) follows from (6b) by symmetry, we need only prove (6b). 

In the special case of Ri = Ro = 1/2, do = 0, and d; = d2 = d, 
inequality (6b) implies that 


2d? 
n(aa- 225) 21, 


oes tay 


which implies that d = 1/5 = 0.200. 





or 
9” 


iil. PROOF OF THEOREM 2 
We start from the standard identity 
I(U1; U2 Us) =-I(Ui; U2) + I(Ui; Us| U2), (7) 


for arbitrary random variables U;, U2, U3. We say that Ui, U2, U3 isa 
“Markov chain” if Ui, U3 are conditionally independent given U;; i.e., 
U3; depends on U;, U2 only through U2. If Ui, U2, Us is a Markov chain 
then I(Ui; U3| U2) = 0, and from (7) 
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I(U,; U3) Ss I(Ui; Uz Uz) = (Ui; U2). (8) 


Note that the hypothesis for (8) holds when U3 is a function of U2. A 
sequence {U,,} is a Markov chain if, for all n, 


(- ee Un-2, Uni); Ons (Un+1, Un+2, aor -) 


is a Markov chain. 
Let us now suppose that we are given a code (fi, f2, Zo, £1, 82) with 
parameters (N, M,, Mo, Do, D:, Dz). We can write 


log M, + log Mz = H(fi(X)) + H(fo(X)) (9) 
= I(fA(&); fa(X)) + H(A(X) A(X) 
= I(fi(X); fe(X)) + 1K; A(X) ACK) ~— (10) 
= I(fi(X); f(X)) + 1(X; &, Y, Z), (11) 


where (9) follows from the fact that f;(X) takes its values in a set of 
cardinality M;, (10) holds because the pair fi(X) f2(X) is determined by 
X and (11) holds because X, Y, and Z depend on X only through fi (X) 
f2(X) so that (8) applies. 

Now (11) is getting close to (5c) in the direct theorem. 
In fact, using (8) twice, we can underbound /[fi(X); f2(X)] by 
I(Y; Z). Now the components of the source vector X are independent, 
and if the components of either Y or Z were also independent, we 
could make use of standard techniques to establish the necessity of 
(5c). But alas, we cannot assume that either the {Y,,} nor the {Z,} are 
independent, so that another tactic is required. The key idea is the 
definition of another random vector V = (Vi, --- , Vi) the components 
of which are in fact independent. 

For 1 = k <= M,, define the set 


Ax = {x: fix) =k} = fi'(k). (12) 


Let the cardinality of A; be pz. Let the random vector V be defined by 


its conditional distribution given X: 
—~1 
x > A x)s 
Pr(V = v|X =x} = tg Py ve Age 


otherwise. (13) 


Thus, given X € Az, V is uniformly distributed on A;. It follows that 
the unconditional distribution for V is 

Pr{V =v} =2”", veB’, 
and the components of V are independent, as desired.* Furthermore, 
Z, fo(X), X, fi(X), V is a Markov chain, so that, using (8), 


*In effect, V is obtained from fi(X) by a channel with transition probabilities 
Pr{X = x|/fi(X) = 2} so that the distribution of V is the same as that of X, hence, iid. 
This is valid for any distribution of X. 
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T(fi(X); fo(X)) = 1(V, A(X); A(X), Z) = I(V; Z). (14) 
Combining (11) and (14), we obtain 


1 1 1 1 “ 
— log M, + — log M2 = ls Z) + FS X, Y, Z) 


N N 

a I(V; Z) + as 1(X; X) 

“Ne N-” 

(1) Edz(V; Z) Ed;(X, X) 

= 2 — h(A) — A(Do), (15a) 
where 

_ Edn(V; Z) 
A= ag (15b) 


Step (1) follows from the “rate-distortion bound” which states that 
if U is a random vector uniformly distributed in BY (as are V_ and 
X), and U is an arbitrary binary random vector, then J(U; U) = 


i= oF Edx(U, o| (See Ref. 4.) 


We will now obtain an upper bound on A in terms of D; and Dz. As 
a “warm up,” let us observe that from the triangle inequality, 


A= < Edx(V, Z) = 5 [Edi(V, Y) + Ed:lY, X) + Edy(X, Z)]. 


Now 
Edx(Y, X) = DN, Edy(Z, X) = DN, (16) 
Furthermore, 


Ed:V, Y) = ¥. Eldutv, Y)|V = v] Pr{V = v}. 


Now suppose that we are given V = v € A,x. Then, Y = gi(k). Since 
Pr{V =v} =2°%, 


M, 
Ed:(V, Y) = x oF 2-dulv, &i(k)] 


k=1 veA;, 
M, 
= 5 Y P(X =x}dulx,()J=ND. (17) 
=1 xeA, 
Thus, 
As 2D, + Daz. (18) 
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Substitution of (18) into (15a) yields that for achievable 
(Ri, Rao, do, di, d2) 


Ri + Re = 2 — h(2d, + d2) — h(do), (19) 


which is the result reported in Ref. 1. 
We will now establish a tighter bound on A, namely, for D2 + 
2D, = 1, 
1 2Di 
A =— Edx(V, Z) = D2 + 2D, - ———.,, 20 
aj BdulV, Z) = Dz + 2D, ~ -—— (20) 
so that (15) yields that for achievable (Ri, Re, do, di, d2), 
2di 
1-— dz 





Ri + Rh, 2=2- a( a + 2d, — — h(do), (21) 


which is (6b), the inequality required for Theorem 2. 


Upper bound on A: We establish inequality (20) as follows. Let k, 
1<=k=M, be fixed. Let A; be as defined in (12), and let its cardinality 
[x = pt. Let the members of Az be the N-vectors {Xm}in=1. Let y = gi(k). 
Thus, when X = x e€ A,, then Y = y. Now, form a ux N 
array, A, with mth row 


Am = (Am, Amy *** , Imn) =Xm@ y, (22) 


where ® denotes modulo 2 vector addition. Thus, dm, = 1, when the 
nth coordinates of x,, and y are different, and dmn = 0, otherwise. Note 
that 


i 
a Eldu(X, Y)|fA(X) = Rk] 


o> Y die Se yy 
SA ae. Xm, = are mn 
N =1 s: a N BX », 2 . 
1 N 
cam N > Sn, (23a) 
where for l <n<N, 
1 BL 
sn = y Amn (23b) 
Le m=1 


is the fraction of 1’s in column n of A. 
Next, for 1 = m S gp, let Zn = g2°f2(xm) be the value of Z which 
results when X = x,,. Let B be the » X N array with mth row 


Din = (bmi, bm2, ee oy bmn) = Zm © y. (24) 
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Then, 
1 
a E[dx(X, Z)|fi(X) = k] 


) dr(Xm, Zm) 
dy(am, bn) 


1 
ae fF [amn(1 — Bn) + Bma(l — am (25) 


where the last equality nies from the fact that for a, be{0,1}, 
a(1 — 6) + b(1 — a) = 0 or 1 according as a = b or a ¥ Bb. Let 7, be the 
expression in braces in (25). Then, 


1 B 
n= i by [Qmn + Onn — 2AmnDmn] 


m=1 


al 
= re [Qmn _ bmn + 20mn(1 > Amn) | 


m 


1 
=—-—Y (Amn — bmn) = Sn — th 


1 
“2 [Bmn — Amn + 2Amn(1 — bmn)] 


m 


1 
= il > (Omn — Gm) = tn — Sn, (26a) 
where 


Dm lsne N, (26b) 


in = 


es 
iM 


and s, is given by (23b). We conclude that t, = |tn — Sn|, so that 
(25) yields 


1 | aes 
Finally, consider 
1 
N E[du(V, Z)| f(X) = &] 


=H ae [di(V, Z)|X = xn] 
=i > E[du(V, 2m) |X = Xn] 
“a: . ¥ div, Zm) Pr{V =v|X=xXm}. (28) 
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Now, from (13) which defines V, 
-1 
= = = Bboy VE Ap, 
Pr{V = v|X = xn} ts otherwise. 


Thus, (28) yields, 


= EldulV, Z)| fiX) = 2) 


— 





du(Xm’, Zi) 
1 


l 

= 

iM 
Mr 


_ 


m’ 





zd diu(am', b,) 


z|- 






ny [Qmn(1 = bmn) - (1 = Gm'n) Omn] 


1c | 
= ay X [sal = tn) + ta(1 — 52) (29) 


Now make the dependence of s, and ¢, on k explicit by writing sy, 
and t¢,,, respectively. Then, on averaging over k, (23b) becomes 


1 M, 1 N 
D, = N E {dy(X, Y)} = » Pr{fi(X) = k} N x Snk 


M,; N 
= ¥Y Pin, k)sne, (30a) 


k=1 n=1 


1 
where P(n, k) = Pr{ fi(X) = k} xy Similarly, (27) becomes 


Di == BldalX, Z)}= y 5 P(n, k)|tu — Sn|.  (30b) 


k=1 n=1 


Finally, (29) becomes 


A= ~ E {di(V, Z)} 


M, N 
= ¥Y Pin, k)fsne(1 — the) + tne(1 — Sne)]. (30c) 
k=1 n=1 
We now apply the following inequality, the proof of which is given 
in Section IV: 
Let S, T be random variables such that 0 = S, T= 1, and E{S} <= 
Di, E{|T- S|} = Da, with 2D, + Do = 1. Then, 





2D? 


2 


E{S(— T) + T(1—S)} s Do + 2D, - (31) 
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Let S, T be the random variables which take the value snz, tne, 
respectively, with probability P(n, k), then (80) and (31) imply that, 
for 2D; + D2 = 1, 
2D3 


= D2 + 2D, - 
A p) 1 1— D,’ 





(32) 


which, when substituted in (15) gives (6b), proving Theorem 2. 


IV. PROOF OF THE INEQUALITY 


Define Q(Di, D2) as the supremum of E{S(1 — T) + T(1 — S)} 
over all distributions of (S, 7) on the unit square [0, 1]? for which 
E{S} =D, E{\|T — S|} s Dr. 

Theorem 3: (a) For 2D; + D2 = 1 one has 

2Di 
1 — D,’ 
with Q(0, 1) = 1. (b) For 2D, + D2 = 1 one has 


Q (Di, Dz) = 2D; + Dz - 


1 
Q (Di, D2) = 3 


To establish this, introduce for S, T, x, y in [0, 1], y ¥ 1, the 
function * 


F(S, T, x, y) =S(1-—T)+T(1-S) 
4x 2x" 

+ —2](S—x) + | ——, -1|(|S-T|- y). (38 
Lemma: For 2x + y= 1, y< 1 the maximum of F(S, T, x, y) over all 
(S, T) in (0, 1} is 2x + y — 2x7/(1— y). 

Proof: For fixed S, the maximum of F' over T must, by piecewise 


linearity, be at either JT = 0, S, or 1. 
(z) If T = 0, then 


2 
F=8+( = 2) (S—x) + a, (S— y), 
I y ) 











= (l-y 
and this is maximized over S at either S = 0 or S =1. 
(a) ForS=0 
4x? 2x? 
I= y (1—- y) 
2x? one 2x?y 2x? 
= 2x + y —- —— —- —— —- Se exty : 
7"T=y 1l=-y (—y) oy 


* This choice of F comes from the duality theory of convex hull formation, as 
described, e.g., in Section VA of Ref. 5. 
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(b) ForS=1 





4x(l1—x) 2x? 
op 











F=1-24+2x4+ aL ARS 
1-y Ly 
2 Ax 2 
I ag Oh peg eg ee 1. 

(uw) If T = S, then 

4x Qx7y 

P= 25 - 25° + ( ~2) (S-x)+ — 
1-y "d= yf 


The maximum over S of this quadratic is at S = x/(1 — y) which is in 
(0, 1] if x= 1 — y and this holds as x = (1 — y)/2 = 1 — y. Hence, the 
maximum is 


2x 2x? (4 -2)( *) 
t= Sa) ey L=y 


(wi) If T = 1, then 


4x 25" 
Pa1-8+(; =-2] 6-9) + |= a] (i625): 

















= (l-y 
which is maximized by taking S either 0 or 1. 
(a) For S=0 
2 2 2 9 2 
F=2x+y-—— esccees aie 
bay Sy A) 
2x? 
=2x+y- 
Ly 
(b) For S = 1, 
4x 4x” 2x" 
F=2x+ y+ ~—— — 2+, 
~"T=y I-y (L=y)* 
2x? 25° 4 2x? 
ee ec ene = Ie ee ee 
Ly by Lay (1- y) 
2 2 
Ss 2x+y- a j 
ey 


since 4x/(1 — y) $2. 
Thus, the maximum is as stated and is attained for T = S = x/ 
(1 — y) and for T = 1, S = 0. This completes the proof of the lemma. 
Turning to the proof of Theorem 3, consider any distribution of 
(S, T) on the unit square for which 
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E{S} =x, E{|T-S|}=y, 2x+yHl. (34) 


If y = 1, then x = 0 from which it follows that S = 0, T = 1 almost 
surely, giving F{S(1 — T) + T(1-—S)} =1. 
If y < 1, one has, by the lemma, 


Qx? 


at 

If one chooses the distribution JT = 1, S = 0 with probability y and 
T = S = x/(1 — y) with probability 1 — y, equality is attained. This 
determines the maximum of E {S(1 — T) + T(1 — S)} subject to (34). 
As 2x + y — 2x?/(1 — y) is monotone increasing (for 2x + y < 1) in 
both x and y, the maximum is unchanged if one allows all x, y with 
0<=x=D,0 ys Dz. This establishes part (a) of Theorem 3. 

For part (b), it suffices to observe that Q is monotone nondecreasing 
in both arguments by its definition and that on the boundary, where 
2D, + Dz = 1, one has Q(D;, D2) = (1 + De2)/2. This establishes part 
(b). (It could easily be shown that Q(Di, D2) = (1 + De)/2 for all 
(D;, D2) in the unit square satisfying 2D, + D2 = 1.) 





E{F(S, T, x, y)} =E{S(1— T) + TMU -8)) Saat y—s 
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Microprocessor Firmware Update Inventory 
Model 


By S. M. BRECHER 
(Manuscript received June 10, 1981) 


Microprocessor-based systems are used in many applications of 
modern telecommunications. The controlling program of most micro- 
processor systems is stored in firmware which is usually coded into 
erasable programmable read-only memory chips (EPROM). As new 
services are implemented, there is a continuing need to update the 
firmware, a potentially expensive process. In this paper, a method is 
presented for determining the resources necessary for updating 
EPROM. firmware from a centralized location by using a rotating 
inventory scheme. 


|. INTRODUCTION 


Microprocessor-based systems are used in many applications of 
modern telecommunications. The controlling program (firmware) of 
most microprocessor systems, is coded into either read-only memory 
(ROM), programmable read-only memory (PROM), or erasable pro- 
grammable read-only memory (EPROM) chips, which are mounted on 
circuit boards. As new services are implemented in the microprocessor- 
based systems, there is a continuing need to update the firmware. 
Frequent updating of firmware is more effectively accomplished by the 
use of EPROM chips, because they can be repeatedly erased by exposure 
to ultraviolet light and reprogrammed by the use of a specially designed 
unit. Typical EPROM circuit packs may require 0.5 hours for erasing 
and 1.5 hours for programming. 

One serious drawback to using EPROM firmware is that it cannot be 
altered by means of a data link. A change in firmware entails the 
removal and reinstallation of the memory circuit boards or packs. 
Because of this manual process, updating a large number of micro- 
processor systems may involve a long and costly procedure. 

The object of this paper is to provide a quantitative method to 
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determine the level of spare boards and programming units necessary 
to achieve a predeterminated time span for updating all the micropro- 
cessor-based systems. 


1.1 Firmware update process 

Typically, there are three conditions that could affect a firmware 
module: (2) Program updates because of new feature introduction, (zz) 
program changes caused by fixing of “bugs,” and (iiz) program changes 
because of hardware updates. In general, a firmware update process 
begins with a notice of a change sent by the microprocessor manufac- 
turer to centralized programming sites where ultraviolet light pro- 
gramming units and a spare inventory of circuit packs are kept. Over 
a dial-up connection, the programming unit receives the latest program 
version and writes it onto spare circuit packs taken from inventory. 
When all spare circuit packs are rewritten they are shipped to specific 
distribution sites associated with a subset of the microprocessor sys- 
tems. From each distribution site, craft persons are dispatched to 
install the updated boards. The removed circuit packs are then re- 
turned to the centralized programming site for updating. The recycling 
process continues until all the microprocessor systems in the field, plus 
their allocated maintenance spares, are updated. 


1.2 Basic model 

The flow and assumptions of the basic update model, for a typical 
process schedule, are shown in Fig. 1. The process of updating begins 
when a firmware change message is received at the centralized pro- 
gramming site. An initial period is used for administrative procedures, 
unpacking of inventoried circuit packs, erasing, and reprogramming, 
packing and crating, and shipping of the circuit packs to the distribu- 
tion sites. Out of this initial time interval, it can be assumed that one 
day will be needed for reprogramming the spare EPROM circuit packs. 
If there are not enough programming units, a delay in the update 
process could be incurred. 

Once the rewritten boards arrive at the distribution sites, the asso- 
ciated microprocessor systems are scheduled for update. The process 
of coordinating the installation forces necessary for the update is not 
immediate, and the following distribution can be assumed to charac- 
terize the update interval: a proportion, 1, of the updated boards are 
installed and an equal amount of outdated boards are returned to the 
centralized location for reprogramming in one week; a proportion, 22, 
of the updated boards are recycled in two weeks; a proportion, 73, in 
three weeks, and so on until all the boards are recycled. 

Once the outdated boards arrive back at the programming site, the 
process of reprogramming begins again. This procedure continues until 
all of the microprocessor systems in the field have been updated. 
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Fig. 1—Typical firmware update process. 


ll. SYSTEM PARAMETERS 


The system parameters to be determined are the update time, the 
total update spares, and the number of programming units. Update 
time, 7, is defined as the interval in weeks between a change notice 
and the time at which all the systems in the field, including mainte- 
nance spares, have been updated. To determine T, it is necessary to 
define the concepts of update spare, total update spares, total systems, 
and spare ratio. 

An update spare is defined as one complete set of EPROM circuit 
packs for one microprocessor system. Since an update spare is defined 
as one full set of boards, an update spare may be equated to a 
microprocessor system and vice versa. Total update spares is defined 
as the number of EPROM circuit packs at the centralized programming 
site available for initiating the update process. Total systems is defined 
as the total number of microprocessor-based systems in service and 
their maintenance spares to be updated when a program change is 
issued. 

The spare ratio, S, is defined as the ratio of total update spares to 
total systems; that is, 
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_ Total Update Spares 
Total Systems — 


A spare ratio of one, S = 1, says that for each microprocessor system 
in the field and its maintenance spares there is one set of update spares 
at the programming site. In this case, T is the time required for one 
pass through the basic flow shown in Fig. 1. Similarly, if S is 0.5, one 
spare for every two systems in the field and their maintenance spares, 
T is the time required for approximately two passes through the basic 
flow. By following this reasoning, it becomes clear that T is a function 
of S. 


ill. MODEL FORMULATION 
3.1 Update time 

The update time can be determined by considering the following 
installation distribution mentioned in Section I and shown in Table I. 
In Table I, 4; = proportion of boards updated in week j, w = number of 
weeks required for returning all spares updated in week 0; i; = 0, for 
j = 0Oand/ > w, 1.e., no system can be updated in week 0 with spares 
reprogrammed in that week. Then, 


Ww 
Y=, 
jJ=0 


which implies that after w weeks all the spares reprogrammed in week 
0 are returned. Define g(n) as the quantity of spares updated in week 
n, with n = 0, week of change message, and n = T, week by which all 
systems are updated. The update distribution can be modeled by 


q(0) = Total Update Spares = S x Total Systems, 
q(1) = iig(0), 


q(2) = i2g(0) + nq(1), 
q(3) = i3q(0) + tqg(1) + nq(2), 


, and 
n-1 
q(n) = ¥ in-Q(J). (1) 
j=0 
Equation (1) can be rewritten as: 
q(n) = >) yg(n — J), (2) 
j=0 


since 1, = 0, and with initial conditions 
q(0) = Total Update Spares = S x Total Systems, 
q(n) = 0, for n<0. (3) 
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Table |1—Distribution of 
microprocessor systems 


update 
Week of Update Proportion 
Process Updated 
1 iy 
2 la 
3 ts 
Ww bs 


Equations (2) and (3) are the difference equations describing the 
microprocessor update process. To obtain the solution of the difference 
equation, define the ratio of total update spares updated in week n: 


q(n) 


I 4 
Total Update Spares’ (4) 


a(n) 


where 
0<a(n) <1. 


Substituting g(n) from eq. (4) into eq. (2) 


a(n) = ¥) a(n —j), (5) 


jJ=0 
and 
a(0) = 1. (6) 


Before solving eq. (5), notice that the update process ends when the 
total number of updated spares equals the number of total systems; 
that is, 


T 
¥. a(n) = Total Systems, (7) 
n=1 


where T is the update time. 
Substituting eq. (4) into eq. (7) and using the definition of spare 
ratio, 
1 
Zam =% @ 
Expressions (5), (6), and (8) are the difference equations for the weekly 
ratio of updated spares and T. 
To obtain T, eqs. (5) and (8) must be solved. The solution can be 
obtained using a computer simulation, or a closed-form technique such 
as the z-transform. To use the z-transform method, recall the following 


properties:’ 
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(t) Translation property: 
k-1 


ZU f(n + k)] = 2*F(z) — 2" y fe" 


where F(z) = Z[ f(n)]. 
(ti) Summation property: 


z| > yy fo = F@) + z f(n). 
(uit) Final value property: 
lim f(t) = lim (z — 1)F(z). 


Using property (z) in eqs. (5) and (6) gives: 
aed 
A(z) =—_——, (9) 
zv— SS yz”? 
j=l 


where i and w were previously defined, and 


A(z) = Z[a(n)]. 
Using property (ii) in eq. (8), and the initial conditions for a(n), gives 
T 
Z y ate | =—=_ A(z). (10) 
n=0 agate | 


Substituting eq. (9) into eq. (10) and considering that i, = 0, gives 


w 


T ye 
Zz j=l 
z| a(n) | = : 
n=1 2Zv — > 20? 
j=l 





The inverse z-transform of eqs. (9) and (6) gives the following relation- 
ship between T and S: 


Ww 
Due 2 


7 ry (11) 
ge = Se 
j=l 





The right-hand term of eq. (11) can be inverted using classical tech- 
niques and z-transform tables. 

An example can be used to show an application of the z-transform 
technique. Consider the firmware update distribution shown in Fig. 1 
and given in Table II. 
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Table Il—Firmware 
update distribution for a 
six-week turnaround 
interval (w = 6) 


Week of Proportion 
Update Updated 
Process ij 

3 0.10 

4 0.50 

5 0.30 

6 0.10 


Substituting the above values of 1; in eq. (11), 


1 z 0.123 + 0.522 + 0.382 + 0.1 ! 


i= Z7 re 12 
z—-1 2°—-0.1z? — 0.52” — 0.32 — 0.1 (12) 





S 


Using the partial-fraction expansion technique for inverting eq. (12), 
the closed-form expression for S as a function of T is 


4 = 0.22727T — 0.36983 


+ 0.09374(—0.67495) 7 
+ 0.03228(0.45089) *sin(2.27567T + 1.23494) 
+ 0.82472(0.85369) “sin(1.41837T + 0.85769). (13) 


A closed form solution to this transcendental equation in T is difficult 
to obtain. An approximate solution is obtained when T is large. In this 
case 


1 
3% 0.22727T — 0.36983, (14) 
or 
T = = + 1.62725, (15) 


which is the expression for T. Equation (15) satisfies all update times 
with an error of less than 2 percent for T = 10, and with a maximum 
error of 7 percent occurring in week 7. 

An integer solution can be obtained by making: 


4.4 
T= E + 16 | (16) 


with a maximum positive error of 1 week. The [-] operator denotes 
rounded-up approximation. Figure 2 shows the representation of the 
update time as a function of the spare ratio for this example. 
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UPDATE TIME 7 IN WEEKS 
=) 


0 0.2 0.4 06 0.8 1.0 
SPARE RATIOS 


Fig. 2—Update time. 


A more simplified approach was used for solving the original real- 
life update time problem. From the given installation distribution, the 
expected number of weeks required to program, ship, and install a set 
of spares, i.e. one pass through the basic flow of Fig. 1, is 


T=0.1X3+05x4+0.3X5+0.1 X 6 = 4.4 weeks. 


This means that one reprogramming process for the average micro- 
processor system takes 4.4 weeks. The update time is determined by 
the number of times this process will be repeated, plus the error 
introduced by using the expected value approach. The number of 
times the process is repeated depends on the spare ratio. Therefore, 
the update time can be represented by the following equation: 


4.4 ai 
= — + error 
S 3 
which agrees with eq. (16) for an error of 1.6. 
The update time algorithm shows the dependency between 7, S, 


and the microprocessor’s update distribution. For an illustrative pur- 
pose, assume that there are as many spare boards at the programming 
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site as microprocessor systems in the field, i1.e., S = 1. In this case, 
T = 6 weeks. This period is the minimum time possible under the 
given basic assumptions. On the other hand, if we have spare boards 
available for only 50 percent of microprocessor systems in the field, S 
= 0.5, the update time according to the algorithm is T = 11 weeks. The 
conclusion is that Tis directly proportional to the installation distri- 
bution and inversely proportional to S. 


3.2 Total update spare requirements 


The number of update spares necessary at the centralized program- 
ming site is determined by the update time and the total number of 
microprocessor systems. The update time determines the spare ratio 
which, when combined with the number of systems, gives the number 
of update spares as indicated in Section II: 


Total Update Spares = [S X Total Systems]. (17) 


For the example of Section 3.1, the update spare requirements for 
implementing a firmware update system for 150 total systems in 20 
weeks is computed as follows: 

First, the spare ratio for 20 weeks from Fig. 2 is; 


S = 0.24. 
Then, the number of update spares is given by 
Total Update Spares = [0.24 xX 150] = 36. 


3.3 Programming units 


The number of programming units necessary for a centralized op- 
eration can be determined as a function of their capacity. To determine 
the number of units, two concepts must be used: weekly spares and 
programming days. 

Weekly spares, g(n), already defined in Section III, is the number of 
spares returned each week to the site for reprogramming. The number 
of weekly spares is a function of the installation distribution. Program- 
ming days, P, is the number of days per week allocated to the erasing 
and programming of EPROM circuit packs. 

The number of programming units is the following function of the 
weekly spares, the number of programming days, and the unit capacity: 


, q(n) _ 1 
Er ts = | —— x — 
rogramming Units Pp C 


| for all n, 
where C, microprocessor systems per day per unit, is the capacity 
factor of the unit. The number of programming units can be computed 
using the three following approaches. 

An upper bound for the weekly spares can be used as a starting point 
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for determining the number of programming units. The number of 
total update spares available at the site can be used for this purpose. 
In this case, the number of units is: 





; ; q(o) 
Programming Units = FR 5 | (18) 
This algorithm overestimates the number of programming units be- 
cause only in the initial week of the update process are weekly spares 
equal to update spares. In the following weeks, the number of spares 
returning vary according to the assumed installation distribution. 

A second approach for determining programming units is to make 
the weekly spares equal to the largest number of spares updated in 
any week after the initial week of the update process. In this case, the 
number of units is: 





; q(n) 
Programming Units = max Px at (19) 
This algorithm reduces the number of units required and introduces a 
delay in the first week of the update process. 

A third approach is to make the weekly spares equal to the expected 
number of spares returned each week for reprogramming. This is 
equivalent to the steady-state of the discrete time process. This ap- 
proach satisfies the programming assumption of one day in 50 percent 
of the weeks and incurs a one-day delay during the other 50 percent. 
Therefore, on the average, programming will require 1.5 days per week. 
As a result, the nondelayed update time, which was derived assuming 
one day per week for programming, must be increased by 10 percent 
(one-half a day per five-day week). In this case, the number of units is: 


| @ 
Programming Units = ee ; (20) 


where @ = expected number of spares returned each week. To evaluate 
Q, the final value theorem can be used. Property (iii) of the z-transform 
methodology gives: 


Q@ = g(~) = lim(z — 1)Q(2) 
= Total Update Spares X lim(z — 1)A(z). (21) 
z1 
Substituting eq. (9) into eq. (21) yields: 


= — 1 w 
Q@ = Total Update Spares x ee 


1 Ww ’ 

z . —7 

Bi Vee 
a 
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which is indeterminate of the type 0 over 0, because )) 1; = 1. The 
j=0 
application of l’Hopital’s rule solves this problem. 
For the distribution given in the example of Fig. 1, the expected 
number of spares is: 
(z — 1)2° 
z°—0.12z?-0.5z7—-0.38z—-0.1 


= Total Update Spares X 0.22727 
= Total Update Spares x A, 


Q=q() =Total Update Spares x lim 
zl 


where A is the expected ratio of total update spares returned each 
week. It can be seen that the reciprocal of A represents the average 
number of weeks required for one pass through the basic flow of Fig. 
1; that is, 
| 1 
T = — =—__——_ = 44 ks. 
A 0227270 e"s 
For the example of Fig. 1, where one day per week is assumed for 
reprogramming, and the unit capacity is three systems per day, eqs. 
(18), (19), and (20) become, respectively: 


Programming Units = | Pose eda Spares) (22) 
Programming Units = | Petal Update Spares and (23) 
Programming Units = | Poe eens (24) 


Comparing eqs. (22), (23), and (24), it can be seen that eq. (24) reduces 
the number of programming units by a factor of more than four, with 
respect to eq. (22), with the introduction of a 10-percent delay in the 
update time. 

The algorithms discussed above are represented in Fig. 3. The 
number of required programming units are shown as a function of the 
number of total update spares. To contain the size of the plot, a scale 
factor, N, was introduced. Also included is a parameter D, which is the 
additional delay in the update time because of the reduced number of 
programming units at the centralized site. 


3.4 Example 


To have a better understanding of the determination of the resource 
requirements for the firmware update process, consider the following 
example. 
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D: DELAY 
N: SCALE FACTOR 
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Fig. 3—Programming units. 


For a projection of microprocessor systems, it is desired to determine 
the total update spares and programming unit requirements for the 
years 1982 and 1983, for an overall update time of 10 weeks in 1982 
and 15 weeks in 1983, under the assumption of providing the fewest 
number of programming units. (See Tables III, IV, and V.) 


IV. ECONOMIC IMPACT OF FIRMWARE UPDATE SCHEMES 
4.1 Economic dependencies 

Based on the obtained algorithms, an economic analysis of an update 
scheme can be performed if the capital and expense costs associated 


with the process are known. 
The capital expenditures for the firmware update process are due to 


2304 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1981 


Table IIl—Forecast of microprocessor 


systems 
Total Update Time 
Year Systems Weeks 
1982 26 10 
1983 116 15 


Table IV—Update spares from Fig. 2 


Nondelayed Total 
Update Spare Total Update 
Year Time ratio Systems Spares 
1982 9 0.59 26 15 
1983 13 0.39 116 45 


Table V—Programming 
units from Fig. 3 


(D = 0.1) 
Programming 
Year Units 
1982 2 
1983 4 


the total update spares and programming unit requirements. There- 
fore, the capital is a function of the desired update time, the total 
number of microprocessor systems, and the delay tolerated. If the 
capital budget is exceeded, the total update spares and programming 
units must be decreased to meet the dollar constraints, increasing the 
update time. The update time is independent of the number of micro- 
processor firmware changes; therefore, the capital is also independent 
of the change rate. 

The update expenses are due to the labor costs associated with the 
programming unit operation, the installation efforts, and the costs of 
crating and shipping. Since these tasks are performed for each micro- 
processor system each time a program change occurs, the expenses are 
dependent on the system market and program change rate. The 
economic dependencies are shown in Fig. 4. 

Using standard techniques, we can determine the economic feasibil- 
ity of alternate memory schemes for microprocessor-based systems. 
One alternative, for example, could be to determine the benefits of 
replacing EPROM memory with random access memory (RAM) and 
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Fig. 4—Economic dependencies. 


magnetic bubble store as backup. This possibility, even though initially 
more expensive, has the capability of being updated electrically via a 
data link, and may offer savings over the life cycle of the system. 


V. SUMMARY 


Algorithms related to the EPROM firmware update process for micro- 
processor systems have been developed in this paper. They give the 
update time, the number of update spares, and the numer of program- 
ming units required at a centralized site for a given firmware installa- 
tion distribution. The algorithms presented do not render policy deci- 
sions, but enable the user to determine the economics and responsive- 
ness of alternative administrative schemes. 
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Systems 
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(Manuscript received June 18, 1981) 


Coherent phase-shift keying (cpsK) and differential phase-shift 
keying (DPSK) are widely used modulation methods in digital com- 
munications. Bandwidth efficiency, good noise immunity, constant 
envelope, and simplicity of implementation make these schemes par- 
ticularly attractive for use over the satellite, terrestrial radio and 
voiceband telephone channels. While system analyses abound in the 
literature, treatment is usually restricted to the additive Gaussian 
channel. Important issues determining ultimate performance, such 
as the joint effect of intersymbol interference and the acquisition of 
carrier phase have not been adequately addressed. The main purpose 
of this paper is to develop analytical tools that can be used to assess 
system performance under practical operating conditions. Pure co- 
herent demodulation schemes such as cpsk are ideals which are 
rarely achieved in practice, and carrier phase must be estimated 
prior to and/or during data transmission. This requires start-up 
time, as well as added equipment, and the fidelity of the phase 
estimate ultimately determines performance. In contrast, DPSK is 
independent of carrier phase, since decisions are made on phase 
differences. However, this comes at a price, and it is known that ideal 
multiphase DPSK suffers an asymptotic performance penalty of 3 dB 
in signal-to-noise ratio (s/n) over ideal cpsk. We develop a new 
rigorous method for calculating the error rates of both cpskK and 
DPSK, under a variety of operating conditions. In particular, we find 
that the intersymbol interference penalty for quaternary DPSK is about 
1 dB worse in s/n than for cPsK. We demonstrate that the detection 
efficiency of CPsK approaches the ideal, provided that the s/n of the 
phase-recovery circuit is about 10 dB more than that at the receiver 
input. Alternatively, for the same s/n, a 10-baud phase-locked loop 
integration time is required to achieve near-ideal performance. 
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l. INTRODUCTION 


Coherent phase-shift keying (cPsK) and differential phase-shift key- 
ing (DPSK) are two techniques often used in digital communications 
over channels such as satellite, terrestrial radio, and voiceband tele- 
phone. The literature abounds in analyses of their performance under 
a variety of conditions. A sample collection of some of this literature 
may be found in Ref. 1. The chief reasons for the widespread use of 
these techniques are simplicity of implementation, superior perform- 
ance over the additive Gaussian noise channel, minimal bandwidth 
occupancy, and minimal envelope variation. 

The relative performance of CPSK and DPSK systems is well under- 
stood only in the presence of additive Gaussian noise. In this case, the 
detection efficiency of DPSK is known to be about 1 dB (in s/n) below 
that of cpsK for binary modulation and this degradation approaches 3 
dB for multilevel systems. In applications where a 3-dB loss in s/n is 
important, such as in down-link satellite, space communications, and 
terrestrial radio under deep fading conditions, cPSK is the preferred 
method. In cpsk, however, the generation and extraction of a local 
carrier-phase reference at the receiver is required. A coherent phase 
estimate is usually obtained by using phase-locked loop (PLL) tech- 
niques, and because of frequency instabilities and phase jitter inherent 
in transmitter and receiver systems, carrier recovery loop bandwidths 
cannot be made arbitrarily small. Consequently, in practice a noisy 
phase estimate is obtained and only partial coherent reception can be 
claimed. The reason for using DPSK is its immunity from slow carrier- 
phase fluctuations; therefore, the phase recovery problem inherent in 
CPSK is avoided. However, the detection efficiency of DPSK may ap- 
proach that of cpsk under noisy phase estimation conditions and 
intersymbol interference (1s1). The need to understand this phenome- 
non on a fundamental level is the principal objective of this paper. 

As bandwidth occupancy is always important, the effects of IsI 
generated by the use of band-limiting filters must be taken into account 
in any analysis of these systems. Because of the linear nature of the 
demodulation process in CPSK, the effect of 1st has been treated in 
great detail. Since DPsK demodulation is inherently nonlinear, the 
analysis of performance is very difficult and no adequate analytical 
methods are currently available. Also, the combined effects of imper- 
fect phase estimation and ISI on CPSK must be determined so that the 
relative detection efficiencies of band-limited DPSK and CPSK can be 
fairly assessed. 

In Section II of this paper, we describe a technique for determining 
the degradation in M-ary CPsK operating in the presence of ISI, additive 
Gaussian noise, and imperfect carrier phase. In Section III, we consider 
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the performance of M-ary DPSK subject to 1sI and additive Gaussian 
noise. 


Il. COHERENT DETECTION 
2.1 System description of CPSK 


Figure 1 shows the M-ary CPSK system that we consider. The signal, 
s(t), before the transmit filter can be represented as 


s(t) = Re{Ax(t)exp[i(27fet + p)]}, i= V—-1, (1) 


where the baseband modulation signal is 


eo 


x(t)= ¥ exp(iaz)rect[(¢ — kT)/T], (2) 
k=—oo 

and the constants A, f., and p are the carrier amplitude, frequency (in 
Hz), and phase, respectively. Also, rect(-) is the rectangular window 
function, T' the signaling interval, and the sequence of discrete phases 
[a,] corresponds to the data sequence to be transmitted. Without loss 
of generality, we assume that the M phase values of a; are uniformly 
distributed with equal probability between (—7, 7]. So, a, takes on 
value in the set a,eA, 


a|7 37 2M — 1 
nal Zz 7 WT |. 
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Fig. 1—M-ary cpsK receiver, M > 2. 
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We also assume that the data phases in different time slots are 
statistically independent. 

In our model, the transmit filter, transmission channel, and receive 
filter are linear and time invariant. Therefore, the complex envelope, 
y(t), at the output of the receive filter may be written as 


y(t) = x(t) @ h(t) + n(t) + m(t) 
h(t) = hr(t) © hc(t) @ hr(t), 


where h(t), hc(t), and Ar (t) are, respectively, the impulse response of 
the transmit filter, the channel, and the receive filter. The symbol 
© denotes convolution. Also, n(£) + in (t) is the complex envelope of 
the Gaussian noise passed through the receive filter. For symmetrical 
filters, n (t) and7i(£) are independently and identically distributed (iid) 
Gaussian random variables with mean zero, and variance 


5° = No | | He(f)|"df, 


where No is the double-sided spectral density of the original white 
noise and Hr(f) is the baseband equivalent transfer function of the 
receive filter. 


2.1.1 Detection in CPSK 
Assuming that the recovered. carrier is exp[i(27f.t + fi )], where ji is 
an estimate of in eq. (1), the detector operates on the signal, w(t), 
represented as 
w(t)= )) 2(¢— kT )expli(a. + €)] + §+ in, (3) 


k=—0o 
where é and 7 are iid gaussian random variables with mean zero, and 
variance o”, 
e=—E~—H, 
is the phase error, and 


2(t) = hr(t) @ Ac(t) @ halt) @ rect (=) 


To estimate the transmitted phase, ao = PeA at t = 0, an ideal cPSK 
detector measures the phase 6 of w(t) at ¢ = to, and a correct decision 
results when 


T 7 
@®-~-—<9#@<O4+— 
M > 


M 
6 = phase angle of w(t), 
w(to) = w(t)! ey 
ayn 
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2.1.2 Error rate for M-ary CPSK 


Here, we briefly review some known results for CPSK and then 
develop new results applicable to our more general model. 

Error-rate calculations for ideal cPSK in added Gaussian noise can 
be found in Refs. 2 to 7. References 8 and 9 provide numerical methods 
for calculating the probability of error in the presence of IsI. Reference 
10 takes into account IsI and demodulation phase error, but the results 
are restricted to only binary and quaternary systems. We now gener- 
alize these results. 

Using the union bound and the representation of the received signal, 
eq. (3), it follows from eq. (4) that the probability of error, Pe(|®), 
given that the phase © is transmitted, is 


max(Pi, P2) = Pe(|®) = P, + Po, 


where 
P, = Pe sin( 0 —-O+ Z| < o| 
a Pr m wteyexn| -i( — z) | < of, 
P, = Pr| sin{ @ --® -—)>0 
2=fTrr|{ sin Mu 


= Pr{ tm teens] —1 (© 4: 7) | > of. (5) 


Note that the average symbol probability of error Pe is 


Pe =a Y Pe(|o). 


DeA 


But, since the signal constellation is assumed to be circularly symmet- 
ric, Pe(|®) is independent of ®. 
For convenience, we shall now assume that ® = 7/M. Hence, 


P,= Pr( Im} resp i( A + a + ‘)| 
+ Ynenp|i( + ap,+ -)]| +y< 0), (6) 


rpexp(iBz) = z(to — kT) 


and )\’ denotes the exclusion of the term k = 0. A similar expression 
can be written for P». 


where 
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Accurate estimations of P; and P2 are easy to obtain in the presence 
of only Gaussian noise, but are more difficult when IsI is added and 
are even more tedious when the distribution of carrier-phase error, «, 
must be taken into account. 

In the next section we derive an exponentially tight upper bound on 
these quantities for a fixed carrier-phase error and then perform 
asymptotic [large signal-to-noise ratio (s/n)] analyses on these upper 
bounds for a given distribution of carrier-phase error. 


2.1.3 Bounds on the error rate 
We begin by writing eq. (6) as 


7 


Pi=< Pr| 16 +y< -nsin( 3 + Bot 7 





be 


where ( ). denotes expectation with respect to e, and where, 
I(e) = ¥’rx sin( Bx, + a + ©). (8) 


Before we can proceed with eq. (7), we need specific information on 
the probability density function (pdf) of the demodulating phase error 
e. We shall assume that the phase reference is derived from a pure 
tone by a first-order PLL. It is well known’ that the resulting pdf for 
the phase error, ¢, is 


exp(A cos €) 
2mIo(A) 


where A is the s/n at the input to the PLL multiplied by the reciprocal 
of the PLL bandwidth, 


p(€) = ,lel=7, (9) 


G 
N,Bi 





A= (10) 
In eq. (10), G is the average power in the carrier, N, is the double- 
sided noise spectral density, and Bz, is the noise bandwidth of the 
linearized PLL. Also, in eq. (9), Jo(x) represents the modified Bessel 
function of the first kind and of order 0. For a second-order PLL, the 
pdf of € is also approximately given by eq. (9). We shall use this density 
to obtain bounds on P). 
Since € is a symmetric random variable, eq. (7) yields 


1 
P, = 5 (Vie) + V(-e))., 
where 


Vie a ert] MSONG/ + Bo +e] +’ rasin(an + By + 2), a1) 


V2.6 
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Using upper bounding techniques and Laplace’s method,"’ we show in 
Appendix A that 


Py = Jia + Joa, 


where 


Fes 1 | exp{—p"[sin’[(7/M) + Bo — €0] + D(1 — cos €)]} 
ae) {cos €o + (2/D) cos 2[(7/M) + Bo — e0]}!” 


exp{—p” sin’[(7/M) + Bo]} | 2 ré 


{1 + (2/D) cos 2[(7/M) + Bol} Sige +o A; 


r 
2 2 
o1 => ri, D=-5, 


a 7 2 7 
go = 5 sin 2 (= + fo)| 1-5 cos2 (Z + po) + - |, D> 1. 
and 


doa = (: — °) V22Dp* exp[—Dp?(1 — cos 5)], 6 = a + Bo. 
T 


Note that p” is the s/n of the system. Also, D can be regarded as the 
ratio of s/n in the phase recovery circuit to that in the PSK system or 
the integration time in bauds. 

Similarly, we can show that 


Pos dia + doa. 


In summary, the average symbol probability of error, Pe, for M-ary 
CPSK system can be upper bounded by 


piace See ociem [ M Ro al 2 coe) 
< {cos €9 + (2/D) cos 2[(7/M) + Bo — €]}'” 


exp{—p” sin*[(7/M) + Bo]} 
{1 + 2/D cos 2(2/M + Bo)}1 


+ 2[1-220*81) aapp 


This upper bound becomes 


P=2 exp| ~o° sin” (Z + fs) | (12) 
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when phase estimation is perfect, D — ». Equation (12) is the well- 
known Chernoff bound for M-ary cpsK.” 
If the observation interval of the PLL is large, D > 1, and if M > 1, 


Ty 7 2 7 
€ = 5 sin 2( E+ fa)| 1-5 eo (E+) 


and 
P< exp( -p*sin'(Z + ps) 
2 cos’[(7/M) + Bo] 2 7 
+ exp| -p sin'(Z + fs) 
~ exp( -p*sin'(Z + ps) 


x{1 2 See) + Bo] E — = cos (z+ ts) |), 


ré 


2 IM > 2, D> 1. 13 
p 2(o7 + a7) ‘ (13) 

Comparing eqs. (12) and (13), we see that the degradation in s/n 
because of imperfect phase estimate for multiphase CPSK systems is 
asymptotically given by 


-1 
2 o{ 7 2 T 
G=|{1-Zeos(F +p) fs 700s 2( + fo) } ; 
where G— 1 as D— ~ as it should. 


2.2 Example of quaternary (M = 4) CPSK system 


Let us consider a quaternary (M = 4) cpsk system and assume that 
the channel is ideal. 

If 4-pole Butterworth transmit and receive filters are used, the 
resulting average symbol probability of error is plotted in Fig. 2. Note 
that the bound is fairly tight and when the s/n of the phase recovery 
circuit is about 10 dB more than at the receiver input, the detection 
efficiency of CPSK is essentially determined by Isi alone. Alternatively, 
we can say that, for the same s/n, a 10-baud PLL integration time is 
required to achieve this 1sI-limited performance. For this filter, the IsI 
penalty is about 1 dB. 

If M > 2, it is well known that the penalty in s/n because of Gaussian 
noise alone is asymptotically given by 1/[sin?(7/M)]. 
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QUATERNARY CPSK 


10-3 __S/N PARAMETERS 
D=4 


EXACT 


— — UPPER BOUND 
10-4 


10-5 


PROBABILITY OF ERROR 


10-6 


10-7 


10-8 





10 12 14 16 18 20 22 24 
SIGNAL-TO-NOISE RATIO IN DECIBELS 


Fig. 2—Probability of error for quaternary phase-shift keying (QPSK) with rectangular 
signaling, noisy carrier-phase recovery, and 4-pole Butterworth transmit and receive 
filters. The s/n in decibels is defined as 10 logio{ T/2No], where No is the double-sided 
noise spectral density and the ideal received signal power has been normalized to unity. 
Parameter D is the ratio of s/n in the phase recovery loop to that in the psK system. 
The double-sided 3-dB bandwidth of the transmit filter is 2/T and that of the receive 
filter is 1.06/T. Sampling time is 1.747. 


The upper bound in eq. (13) indicates that if the definition of s/n is 


modified to take into account the IsI power o7, the additional penalty, 
because of imperfect phase estimation, is 


G.= sin Z a Bs) 
eles eer ee. eee pea . 
D cos (F fs) — D cos (z ps) : 


This quantity is plotted in Fig. 3. We observe that the s/n penalty 
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SIGNAL-TO-NOISE RATIO PENALTY IN DECIBELS 





5 7 9 11 13 15 7 19 
SIGNAL-TO-NOISE RATIO D 


Fig. 3—Signal-to-noise ratio penalty for M-ary cpskK with imperfect phase estimation. 
Parameter D is the ratio of s/n in the phase recovery loop to that in the PSK system. 


because of ISI is independent of M. In Fig. 3, also note that G; > 
1/[sin?(7/M)] as D > o., 


Ill. DIFFERENTIAL DETECTION 
3.1 System description of DPSK 

The M-ary DPSK system is shown in Fig. 4. As before, the baseband 
modulated DPSK signal can be represented as 


co 


x(t)= ¥ exp(iaz)rect[(t — kT)/T]. (2) 

k=—© 
Here, however, the sequence of phases [ 8; ] = [az+1 — ax] corresponds 
to the data sequence to be transmitted. Again, we assume that M 
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Fig. 4—M-ary DPSK system. 


phase values of ; are equally distributed over the interval [0, 277) and 
choose 


By = (21 - 1) - 1</1<M, modulo 27. 


As in CPSK, we represent the set 
a 37 T 
a oe eee (OM = 1 
|= M ( ) M 
by [A]. Also, we shall assume that the phase symbols, £;’s, in different 
time slots are statistically iid. 
If the received phasor at time fo is indicated by z and the one in the 


succeeding interval is indicated by 2a, the detected phase difference 
measured by an ideal differential detector is 


6 = angle of w, w A z*za, 
where * represents the complex conjugate. For the system shown in 


Fig. 4, 


z= Yo (ge + ipzexp(ia,) + ne + ins, (14) 
k 


=—00 


and 


Za= Yi (gr-1 + e-i)exp(lar) + ne, + iNs,, (15) 


k=—a 


where g; and p; are real, 





i — kT 
2n + ipr = | A(to — preet(# du. (16) 
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As before, A(t) is the overall impulse response of the system with 
transfer characteristic 


H(f) = Hr( f)Hc( f)Hre(f). (17) 


In eqs. 14 and 15, ne, Ns, Ne,, and ns, are iid real Gaussian random 
variables with mean zero and variance 


ive] 


as | | He(f) |’df, 


where No, is the double-sided spectral density of the added white 
Gaussian noise. In eq. (17), Hr is the transfer function of the transmit 
filter, Hr, of the receive filter, and Hc(/f), the transfer function of the 
channel. The assumption that the Gaussian noise at to) is independent 
of the noise at f — T can be justified if the receive filter bandwidth is 
small compared with 1/7. Most of our analysis can be extended if 
these two noise samples are correlated. 


3.1.1 Probability of error for M-ary systems 


If the transmitted symbol associated with the time index k = 0 is 
®eA, a correct decision is made when the received phase difference 0 
is such that 


or T 
—-—<6§<@+—. 
- M M 
As before, the following bounds apply: 
max(P:, P2) = Pe(|®) = Pi + Pa, 


where 
P,= Pr sin( —~O+ z) < o| 
P> = Pe sin(@ _~o- z) > 0]. (18) 


These statements are identical to the ones that apply to cpsk, but here 
@ represents a “differential phase” and, therefore, the estimation of 
these probabilities becomes extremely involved. Good estimates are 
only available when Gaussian noise is the sole source of impairment. 

We proceed to analyze P, and observe that a bound on P; also 
provides a bound on P». Since the calculations are extremely tedious, 
we relegate the details to appendices and strive to develop only the 
main ideas here. Therefore, from eq. (18), we get 
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pla) = Py sn(o-0 +2) <0] 
-nfinven[~(0-5)| 
= Pr tm 2*asexp| -i(@- 2) <o} 


= Pr{ Re 2)" exp] -1 (© — a + 3) < of 


= Pr(Re zfz2< 0), (19) 


where 


Z21=2 


22 = zeexp| (° _ a + =). 


and z and 2g are given in eqs. (14) and (15). Since 
2 

















peigtes eel slew (20) 
2 2 : 
eqs. (19) and (20) yield 
P,({®) = Pr(|wi| < | wel), (21) 


where 


Z2it+ Ze 2 + zgexp — i[® — (7/M) + (2/2)] 
2 2 





WwW, = 


foo] 


1 : 
> 3 {( + ipr) + (ga-1 + Pr-1) 
k=—0 

x exp| -i( @ ies a a z) Jexpiaw + & + ins 


Z1— 22 2— zgexp — i[® — (7/M) + (7/2)] 
oP ae ea eae laa 


loo) 


> . {( + ipr) — (ge-1 + ipe-1) 


k=—00 


x exp| -i( - a + Z) | Jexptian + é + i7-, 


and where £4, 4+, €-, and n_ are Gaussian noise terms, given by 
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ne + Re(nea + insa) exp — i[® — (2/M) + (x/2)] 


b= 5 

nett Im(rca + insa) exp — i[® — (7/M) + (27/2)] 
+= ee 

ga Re Re(nea + inea) exp = i[® = (a/M) + (2/2)] 
== ee eee 


and 


_ Ms ~ Im(‘teq + insa) exp — t[® — (7/M) + (7/2)] 
Feats Stee Boe gS Ge ae = he oes 


It can be verified that the above Gaussian random variables are iid 
with mean zero and variance o7/2. 


3.1.2 Exact computation of probability of error 


For a given symbol sequence, the conditional probability of error is 
seen from eq. (21) to be given by the probability that a particular 
Gaussian quadratic form exceeds another. This is a well-known prob- 
lem and the answer can conveniently be expressed in terms of the 
tabulated Marcum Q function.’® Thus, after some algebra, eq. (21) can 
be shown as“ 


P, (|®, symbol! sequence) 


-zea|1-o( ), 
oi + 08 Voi + 0 Voi + 03 
o3 a A+ 
+=—— Q| ———. , —=— }, 22 
ara (a ae) a 


where 





Q(a, 6) = [ exp( - a _ =) laa, 
6 

and J,,(-) is the modified Bessel function of the first kind and of 
order n, 

A+ = |(Widenn, |, 

a = |(We)¢_n_|; 

of = (&) = (i) = 0/2 
and 


of = (22) = (9?) = 0°/2. 
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The major difficulty at this point is clearly carrying out the averages 
in eq. (22) over all possible symbol sequences. In general, a_ and ay 
contain an infinite number of IsI terms and the averaging process is 
difficult. Clearly, for a small number of ISI terms, it can be carried out 
by enumeration. But, in general, the number of terms in computing 
the average explodes exponentially and enumeration becomes intrac- 
table. For example, for 10 1s1 terms and a quaternary DPSK system, the 
number of terms is about a million! So, we obviously need more 

_efficient methods of estimating these averages.” 

In this paper, we assume that the number of dominant ISI terms 
contained in a+ and a_ is not large and that they become insignificant 
when ISI samples are far away from the desired sample. Assuming that 
the same number N indicates the number of dominant preceding and 
succeeding ISI samples (total significant ISI terms is 2N), our approach, 
then, is to obtain upper and lower bounds on P, as a function of N and 
demonstrate that these bounds coincide with N > . 

For any N, the evaluation of these bounds requires M?% computa- 
tions. This can be carried out with modest effort on a high-speed 
digital computer. The error becomes smaller when N is increased. 

We show in Appendix B that the error probability can be bounded 
as 


xi(N) = Pi(|®) S x2(N), 


where 


1 
1+ (1 — A)’ 


xi(N) 
xf 7 (al V2a+ V2a_(1 — A) YI 
ov1 + (1— A)?) ov1 + (1 — A)? 
jy ae A)? (4 V2a_(1 — A) V2a4 ) 
1+ (1 — A)’ oV1l + (1— A)? ovi1 + (1— AY’ 


[s(t 


k>N 


A? 2 
-4 exo( — ea (23) 





and 
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1 


ee y., 


{1 (ol V2a+ V2a_(1 + A) 
\ oV1 + (1+ A)? ov1 + (1 + A)? 
(1+ Ay / V2a_(1 + A) V2a- 
rear oV1 + (1 +A)? oV1+ (1+ A)? 


[i (EB) onl ga 


k>N 


2.2 
+4 exo( = 3H seit (24) 


k<-N 
k>N sy) 


In eqs. (23) and (24), A and p are arbitrary, 0 = A < 1, and 


G, = |; { + ips) + (gpa + in exo| -i( @ a at =) 


H, = E { + ipr) — (gr-1 + imnero| -i( @ — ~ + 5) : 


For any N, we can choose A and p by trial and error so that the 
difference between the upper and lower bounds is a minimum. Since 
this optimization is not critical to our method, we choose 


_ [20% ((a?) aan lies 
[ES +m 5s)] 


oR = max y Gi SS i), 
k<-N k<-N 
k>N k>N 
For A < 1, the difference, Z, between the upper and the lower bounds 


can be shown as 








where 


2 


5 sax Onin 
z= 2a) 1+ ce 2 Jpae.m 
—(a*)\ op (a) o 
+8 exo) eA ( + a s) 
A (a2) o 
+ sel -—sag] Sts) 
k<-N . 


k>N 
A (a=) — 
k<—-N 


k>N 
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where 


l- @e9)) 
+(e). 


and a+ and a_ contain only the first 2N significant terms. When N —> 
oo, Z in eq. (25) can be seen to approach zero. 

Since a+ and a-_ in eqs. (23) and (24) contain a finite number of ISI 
terms, we can use the direct method to evaluate the averages and then 
compute the bounds. We choose the initial N so that og < 1, A = 
Vor. We then increase N so that the desired accuracy of computation 
is achieved. 


3.1.3 Upper bound on the probability of error 


Since the exact evaluation of P(|®) is difficult—though we have 
developed in the last section numerical techniques which can be used 
to compute P, with any desired accuracy—we attempt to derive an 
upper bound on P. 

Although our bounding approach seems reasonable, the final bound 
that we obtain turns out to be loose. Our purpose in including this 
section is to alert readers about this approach and to emphasize the 
importance of the tedious, but necessary, computations outlined in 
Section 3.1.2. 

To facilitate our bounding techniques, we need the following rela- 
tions. For any two random variables x and y and any two real numbers 
a and A, we can show (see Fig. 5) that 


Pr(x > a+ A) — Pr(y < —A) 
= Pr(x+y>a) 
= Pr(x > a — A) + Pr(y> A). (26) 
Equations (21) and (26) yield 
P,(|®) = Pr(|we| — | wi] > 0) 
<= Pr(|w2| > A) + Pr(—|wi| > —A) 
= Pr(|we| > A) + Pr(|wi] <A), (27) 


where A is arbitrary. We choose A > 0 so that the upper bound in eq. 
(27) is a minimum. The method of choosing A will be discussed later. 

Now, for any complex random variable z = x + ty, we can show (see 
Fig. 6) that 
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. 


Gj 
(b) 


Fig. 5—(a) Upper bound on Pr(x + y > a), x and y, any two random variables and A 
is arbitrary. (b) Lower bound on Pr(x + y > a), x and y, any two random variables and 
A is arbitrary. 


(a) 


Pr(|z| > a) 
< Pr(|Re z| > a) + Pr(|Imz| > Va? — a3). 
Hence, 
Pr(|w2| > A) 
= Pr(|Re w2| > Ai) + Pr(|Im we| > Az), AZ+AZ=A?, (28) 


where 


Rew2=é + ¥ Cycos(az + Ax) 
k 


=—0 


fo] 


Imwe=n-+ YY Cysin(az + Az) 


kh=—00 
Ch = F { + ipr) — (gr-1 + inex ~i(o - G + =) | 
and 
Crexp(ids) = + {(@ ee ip.dexw| -i( @ ats | ; 
2 M 2 


Since the real and imaginary parts of w2 are the sum of a Gaussian 
random variable and a set of interference terms, various methods given 
in Refs. 16 to 18 and 19 to 27 can be used to bound Pr(| Re w2| > A) 
and Pr(|Im we| > Az), Ai + Aj = A’. Even though the other bounds 
are sometimes claimed to be tighter, we shall use the simpler Chernoff 
bounds. 
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Fig. 6-—Upper bound on Pr(|z| > a). Parameter A satisfies 0 =< A <a. 


hE 






‘Pe 


Appendix C shows that 
Ai-C Ao — Cup)? 
P(e] > A) = 2 exp] ~ eat 426 xl - (aa = Cw } 








) 
where 
Ai=A cos($ + aa) 
Avs sin( 2 : a) 
Cun => max {[Cocos(ao + Ao) + C,cos(ai + Ai) J} 
a: 
oO" M 
Cuz = max{[{Cosin(ao + Ao) + Cisin(ai + A:)]} 
So ye 
ao = M 
o = YC} 
and 
yy” A > 
k= 
he 
k 
o, for any complex random variable z = x + iy, we can show (see 
Fig. ae tha t 
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. 
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Fig. 7—Upper bound on Pr(|z| < a). 








jImz|<a ez 






Pr(|z| < a) Ss Pr(| Re z| < a, |Im z| < a) 
Pr(|z| < a) = Pr(| Re z| < a) 


Pr(|z| < a) = Pr(|Im 2| < a). 


Hence, 
Pr(|w,| <A) = Pr(|Re wi| < A). 
Since 
Re w; = |Re wi|, 

Pr(| Re wi| <A) = Pr(Re w: < A), 

and 
Pr(|w,| <A) S Pr(Re w; < A). 

We write 


Re w; = & + »> D,cos(az, + dz) 


k=—0 


Im w= n+ + YY Dysin(az + 62), 


k=—co 


D, = 5 { + ipr) + (Zr-1 + inners] —i( 0 _ a + =)|} 


Dyexp(id;) = ; { + ipr) + (gr-1 + ioexp| -i( @ = a ay z) | 


2326 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1981 


and 


YY Dacos(az + dx) 


k= -a 


= Docos(ao + 50) + Dicos(a; + 61) + &” Dzcos(az + 82). 


Using Chernoff bounding techniques, it can be shown that 


Pr(|w:| <A) = exp| - al 
where 
Du = min[Docos(ao + 50) + Dicos(a; + 61) ] 
a1 — ao = = 
M 
oh = "D- 
The upper bound on P;(|8o = 7/M) can, therefore, be written as 


7 


Ai — Cm)’ 
r=) =2e0| S| 


= 2 _ 2 
3.9 exo| - (Az — Cuz) | is exp|- (Du — A) | 





o” + 0 o + 0%. 
Ai — Cm = 0, Az — Cue = 0, Du - A200, AT+ AZ=A* = (29) 


The bound is minimum when A, A, and A» are chosen so that the 
derivative of eq. (29) is zero. This can be found by using well-known 
numerical methods. 


3.2 Example of quaternary (M = 4) DPSK system 


Let us consider a quaternary (M = 4) DPsK system and assume that 
the channel is ideal. 

If 4-pole Butterworth transmit and receive filters are used, the 
bound given by eq. (29) is plotted in Fig. 8. The bound with zero ISI is 
plotted in Fig. 9. The exact probability of error with IsI is plotted in 
Fig. 10. With or without Is1, the bound is unfortunately not very tight. 
Actually, one can show that the penalty as predicted by the bound 
with zero ISI is about 4.6 dB worse than the actual penalty for a binary 
system, and 8.3 dB worse for a quaternary system. This is inherent in 
our techniques and not the result of using Chernoff bounding methods. 
In our opinion, obtaining tighter bounds is still an open problem. 
Comparing Figs. 2 and 10, we note that IsI penalty for quaternary 
DPSK is about 1 dB worse than for cpsk. We needed 9 isi terms to 
compute Pe with 5 percent accuracy. 
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UPPER BOUND FOR DOPSK 


10-3 


_-—WITH ISI 


10-4 


10-5 


PROBABILITY OF ERROR 


10-6 


1077 


10-8 





10-9 
22 24 26 28 30 32 34 36 


SIGNAL-TO-NOISE RATIO IN DECIBELS 


Fig. 8—Upper bound on the probability of error for differential gPsK (DQPSK) with 
the same transmit and receive filters as in Fig. 2. Other assumptions are as in Fig. 2. 


IV. SUMMARY AND CONCLUSIONS 


For multiphase M-ary cpsk, we develop an analytical procedure for 
determining detection efficiency when the system is subject to additive 
Gaussian noise, ISI, and imperfect carrier-phase estimation. For a large 
s/n, we provide a simple formula for calculating the combined penalty 
caused by IsI and noisy phase recovery. For multiphase DPSK, where 
the detection is inherently nonlinear, a rigorous method is developed 
for calculating the error rate in the presence of ISI and additive 
Gaussian noise. Using these analytical techniques, it is possible to 
compare the performance of CPSK and DPSK and examine various 
parameter trade-offs. Numerical examples are provided to illustrate 
our methods. 
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DQOPSK WITH ZERO 


7 
UPPER BOUND—~ 


PROBABILITY OF ERROR 





14 16 18 20 22 24 26 28 
SIGNAL-TO-NOISE RATIO IN DECIBELS 


Fig. 9—Upper bound on the probability of error for DQPSK with zero ISI. 


APPENDIX A 


Chernoff Bound on the Probability of Error 


From Section 2.1.3, 


P, = | [V(e) + V(—e) ]p.(e)de 


7 | [Vie) + Vi-e) |pledde 


= J, + Jo, 


where 
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10-3 


wee. WITH ISI 


10-4 


PROBABILITY OF ERROR 


10-6 


1077 


10-8 





10-9, 
11 13 15 17 19 21 23 25 
SIGNAL-TO-NOISE RATIO IN DECIBELS 


Fig. 10—Probability of error for DQPSK with the same transmit and receive filters as 
in Fig. 2. Other assumptions are as in Fig. 2. Note that 9 1s terms were needed to 
compute Pe with 5 percent accuracy. 


8 
ji = | [V(e) + V(—e)] p-(e)de 


0 


Jo = | [V(e) + V(—e)]p-(e)de, (30) 
6 


~ and where V(e) is given in eq. (11). Note that sin(7/M + Bo — €) > 0 for 
0s€<7/M + Bo; also, sin(7/M + By + 6) > 0 forOSe<a7— (7/M 
+ Bo). Hence, sin(z/M + Bo + €) > 0 for 0 < € < 6, where 


§= min| 2+ Rn (i+ fs) | 
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Since 
0 = erfc(x) = 2, 
0=[V(e) + V(—e)] = 2, 
and, therefore, 
J, = 2 [ plod = Pr(|e| > 6). 
Now, from eq. (11), 


V(e) = Pr| =n —I(e)> rsin( + Bot . | (31) 


where 7 is a zero mean Gaussian random variable with variance o” and 
I(e) is given in eq. (8). 
Using the Chernoff bound 


Pr[x > a] S$ exp(—pa)(exp(ux)), p20, 
eq. (31) yields 


V(e) = exp| ~Ansin( 7+ Bo + . ewan + I(e)]}. (32) 


For a given e, J and 7 are independent, and since the data phases 
a, in different time slots are iid, 


2 
exp{—A[n + I(e)]} = exp xe []’ (exp[—Ar:sin(@, + ar + €)])o,- (83) 


We shall now assume that M is an even number so that if DeA, 
(7 + ®)eA. Hence, 


(exp[—Arzsin(B, + ax + €)])a,5 0S a,< 27 
1 
=5 (exp[—Arzsin(B; + ax + €)] 


+ exp[—Ar;sin(6; + 7 + az + €)])a,, OS<ar<7 


= (cosh Arzsin(Bz + az + €)) a, O=a.< 7. 
Since 


cosh x = exp(x?/2), 


(exp[—Arzsin(B; + ax + €)])a,, 0S a,< 27 


oe ee 
= (exp > sin (Bx + a + €) | da,» Ox<a,<7 
Xa 2 
= exp( st) (34) 
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From eqs. (32), (33), and (34), 
2 
Vie) = exp| ~Ansin( 2 + Bo + c) Je| (o? + a) A= 0, 
where 
of = DY ri. 
Similarly, 
V(—e) = exp| —Arosin ey es e Lee i) 
= exp 0 Mu 0 € xp 5) oO Of) |. 
Hence, for 0 = ¢€ < 7/M + Bo, sin(7/M + Bo — €) > 0, and 


_ rosin*[(7/M) + Bo — €] 
2(0° + 07) ; 


Also, for 0 S €< 7 — (7/M + Bo), sin(7/M + Bo + €) > 0, and 


V(-e) = exp| 


26D 
Vie = exp| - ee 
Hence, 
S rosin’[(7/M) + Bot €] 
Vie) + V(-e) = exp| - cn 
: _ rosin’[(7/M) + Bo — €] 
oe, Qe + 07) 


= exp| ~otsin'( Z + Bo+ ‘)| 


+ exp| -p'sin'( + Bo- ‘)|. 


2 
2 ro 


OU +0)’ 
0se<6=min| + Bo, 7 — (i+ 6) (35) 


of = YY ri, 


The parameter p’ is the s/n of the system. 


Note that Bo is the phase angle of the complex overall impulse 
response evaluated at ¢t = ¢. In a well-designed system, it is usually 
small, and eq. (35) shows that the optimum value of Bo is zero. Also, 
since fo is usually small and we are interested in M > 2, 6 is usually 


a/M + Bo. 
Thus, from eqs. (9), (30), and (35), we conclude that 
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6 
1 : 7 
J, S dia = sam | {exo| —o'sin'(Z + Bo- .)| 


+ exp| ors’ + Bot . jexmtDeteo elde, (36) 


where the quantity D can be regarded as the ratio of s/n in the phase 
recovery circuit to that in the psk system or the integration time in 
bauds. 

- We have not been able to evaluate eq. (31) in closed form but, if 
desired, numerical techniques can be used. To obtain physical insight, 
we shall assume that p’ > 1 and use Laplace’s method to evaluate eq. 
(36); the technique is an application of the following theorem: Jf h(t) 
ts a real function of a real variable t, has a unique maximum at t = 
a, & S aS a, and if x is a large positive variable, it can be shown 
that" 


a2 1/2 
f(x) = | g(t)exp[xh(t) ]dt ~ g(a)exp[xh(a)] uae ; 


1 


From eq. (36), 


1 
Ca are TE cS) 37 


5 

Jy = | exp| el D cos € — sn'(« mie bs) lee, (38) 
‘ M 
: T 

Je = | exp| oD cos € — sin'(« +—+ ts) |b (39) 
" M 


The saddle point €) at which the exponent in eq. (38) reaches its 
maximum in (0, 6) is given by the solution of 


sin €0 1 


sin 2[ (7/M) + Bo = €o| D : (40) 


The transcendental eq. (39) can only be solved numerically. However, 
we can obtain a series solution for «) by using Lagrange’s reversion 
formula.”’ If a function f(z) is regular in a neighborhood of 2o and if 
f (Zo) = Wo, f’(Z0) ¥ 0, then it can be shown” that 


f(z) =w 
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has a unique solution 


_ 2 n-1 
are ow"| | (41) 


2=Z 


where 
zZ— Zo 
f(z) — wo 


Choosing 2 = 0, eqs. (40) and (41) yield 
) oe 
€ = D sin (z + ps) 


2 
x|1- F 00s 2( Fh) +], D> 1, 0O<a <6. 


O(z) = 


From Laplace’s formula and eq. (38) 


Jy = exp| -p' sin'(Z + Bo- a) — Dcos e|| 


ee, eee Ae 
- 2p7[D cos €) + 2 cos 2(7/M + Bo— €)]} “a 


Similarly, it can be shown that the exponent in eq. (40) reaches its 
maximum in (0, 6) at e = 0 and 


am exp| -p' sin'( + ps) — || 


1/2 
T 
| ap + 2 cos(a/M + ai =) 


For p’? >> 1, D> 1, 

exp(Dp’) 
2nDp’ 
From eqs. (37) to (39) and (42) to (44) 

rads 1 (Oe + Bo — €] — D(1 — cos &)]} 
7 2 [cos €) + (2/D)cos 2[(7/M) + Bo — €o]]'” 


[1 + (2/D)cos 2[(7/M) + BoP?) 


I(Dp*) = (44) 


Also, 
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Jo S dq = 2 | p-(e)dt 
, 


exp(Dp’cos ede) 


1 
~ -aIo(Dp’) | 


a — 5 exp(Dp’cos 8) 
T In(Dp”) 


(1 - °) V2mDp* exp[—Dp?(1 — cos 8)] 





R 


po’ > 1, D> 1. 


APPENDIX B 
Upper and Lower Bounds on the Probability of Error in DPSK 
Let us write 
2d = Zan + Ne, + Ins, + Zar, 
where 
zvn= Y (ee + ipr)exp(iaz) 
>—N 
k=N 
Z2= »> (gz + ip) exp(iaz) 
k<—-N 
k>N 
Zan > > (gr—1 + Upr-1)exp (tax) 
k= 
k=N 
and 
Zar = Y, (ge-1 + ipe-1)exp(iaz). 
k<-N 
k>N 


Note that zy and Zen contain a finite number of IsI terms, whereas zz 
and Zar contain an infinite number. Without loss of generality, we shall 
also assume that g;, and p; are monotonic decreasing functions of | k]. 
N is an arbitrary positive number. 

Now 


P,([|®) = Pr(| win + wir| < | wen + Woe)), (45) 
where 
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Zn + ZaneX ifo-742 
N T Zan€Xp M2 


WN a + &, + ins, 


- SOs ae 
ZR T ZdReXp U M2 


QF aaa 


2 
ZN zeveno| ~i (° - a + =) | 
Wa a ge 
and 
Zr — canes i (° - a + =) 
W2R a, rae 


Note that wiy and Wey contain a finite number of IsiI terms, and €+ + 
ins and ~ + in— are independent zero means complex Gaussian 
processes. 

For any two complex numbers s; and Ss», 


| si] — |se| S| si + so] S| si] + | 50|. (46) 
Hence, eqs. (45) and (46) yield 
Pr(|win| < |wen| — | wir| — | wer|) S Pi(|®) 

= Pr(| win + Wir| < | Wen + Wee|) 


= Pr(|win| < | won| + | wir| + | wer) 


or 


+ 
p,( el DIN ae ae ae < P,(|®) 


| won| | won| 
+ 
= p,( mint — | 4 wrel + [weal 
| won| | wen| 





Using the bound in eq. (26), eq. (46) yields 


+ 
pr( Me 21i= 4) = p(s earls A ) < P\(|®) 


| wen | wen | 


+ 
2p Mla eal [els Toerl , 4), (47) 
| won| | won| 





where we choose 0 =A <1. 
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For any two random variables x and y and any a > 0,”° 
Pr(xy >a) Ss Pr( Ix >2) + Pr(|y| > p), (48) 
where p > 0 is arbitrary. From eqs. (47) and (48), 
p(t 1- 4) — Pr(|wen| < p) 


- Pr(|wir| + | wer | >Ap)=s P,(|®) 





=< Pr | wan <1+A] + Pr(|wen| <p) 
| wn | 


+ Pr(|wir| +|wer|>Ap). (49) 


Since w ny and Wey are independent complex Gaussian processes, 
from Ref. 13, 


2 
p(t ie 4) = at 


| won| of + o3(1 + A)? 


«{1-<@| As. a-(1 + A) ae > 
Vo? + o3(1 + A)? Voz + o3(1 + A)’ 


o3(1 + A)? 
ot + o3(1 + A)? 
+A 2 
x<Q —- ee) a —— ct=o3=—, (50) 
Vop+ 031A)? Voz+o3(1 +A)" 2 


where a+ and a-_ are the appropriate truncated values of as and a- 
defined in Section 3.1.2. Also, from Ref. 18, 


Pr(|wan| <p) =1-<Q (=. 2), (51) 
62 62 
Using Chernoff bounding techniques, we can also show that 
Ap? 
Pr(|wir| > Ap) = 4 exp( - real (52) 
k<—-N 
k>N 
2p” 
k<—-N 
k>N 
where 
2) Geis Gasioenl aos 
R= 5 &k T lpr 8r-1 T lpr-i)eXp M2 
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and 


HA; = 





1 . P . 
5 {a + ipr) — (gr-1 + ison] ~ (® — a+ =) |} 
Equations (49) to (53) yield 


xi(N) s Pi(|®) S x2(N), 





where 
1 
= eae 
{ 20+ V/2a_(1 — A) | 
xi 1-< Q | —————.,, —_=—————_ | > 
oV1 + (1— A)? oV1 + (1 — A)? 
aay <9 V2a_(1 — A) V2a4 | 
i+ (Q-A" *\o¥i+0Q—b) ovit Ua 
= E - <Q (44,-), | 
(eo) oO 
A2p? A2n? 
ERED 5 7 Gi Sean “Ty 
k<-—N k<-N 
k>N k>N 
and 
(Nie 
NT Fe ie AY 


x{r- <a V2a4 V/2a_(1 + A) | 
sil O AY vie +A 
(1 + A)? Q V2a_(1 + A) V2a+ 


1+ (1 + A)’ 


ovl+ (1+ A)? ovl+ (1+ a2)” 
r-<a(t2) 





0 


A’p? A’p? 
+ 4 exp SS Soe + 4 exp = So: 
2 GZ 2 Hi }’ 
On wan ‘ 
k>N k>N 


where A and p are arbitrary. 
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APPENDIX C 
Upper Bound on the Probability of Error 


Now, 
Pr(| Re we| > Ai) = Pr(Re w2 > Ai) + Pr(—Re we. > Ai) 
and 
Pr(Re we = A;) S exp(—pA:) (exp(ps Re we) ) 
= exp(—A:)(exp(u Re w2)). 
Since the Gaussian random variable é~ is assumed to be independent 


of the interference 


(exp(u Re w2)) = exp(uo"/4) exp y Crcos(ap + rf. 
k 


=—09 


Since By = ar+1 — az, and we assume that £;’s are iid, we can assume 
that a,’s are independent, and 


a, E A, k even 
2a 4a 

> M’ M’ 

that is, the signal constellation for odd (even) k can be obtained by a 


simple rotation of the constellation for even (odd) R. 
Let us now assume that the transmitted symbol ® is 7/M so that 


Py(|®) = P, (16. = Z| 


1 
=5jlP (ia = «= 7). 


ae -@M-9 2) a4, k odd; 


Noting that 


yy Cycos(ar + Ax) = Cocos(ao + Ao) + Cicos(ai + Ax) 


k=—00 


+ ¥” Creos(az + Az), 
2:2 





Pr( Re W2 > Ai|a1 — a = Z| <= exp| “A + e 
+ p[Cocos(ao + Ao) + Cicos(a1 + wit 


xX (exp[u ¥\” Crcos(az + Az)]). 


We use the notation ” to indicate that k = 0, and k = 1 terms are not 
included. 
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Now, 
(exp[p 4” Crcos(az + Ax)]) = [[” (exp[wCrcos(az + Axz)]) 
= [[’ (exp[uCzcos(az + Ax)]) 


even 


x [[” (exp[uCrcos(a, + Az)]). 
odd 


Most often, M = 2”, L an integer, and since this assumption simplifies 
our bound, we shall assume that M is an integer power of two. (A 
slightly more complex bound can be derived if M # 2”.) Now, 

TI” (exp[wCrcos(az + Ar) ]) (a,|-2<a,<7) 


even 


T[” (cosh[wC,cos(az + Axz)]) (a,)0<0,<2) 


even 


I’ ; <cosh[uCzcos(az + Ax) ] 


even 


+ cosh) Creos( 3 + ap + rs) p> 


(wlrcas) 
a, Qe=—| 
k ut 2 


II” < cosh| pica ; cos( a + Ar + *) | 


even 


x cosh| nisin” sin( ay + An + *) > 


7 
(« | o<ai<7} 


Since 
cosh x = exp(|x\), 
and 


cosh x < exp(x°*/2), 


ceosh] Creo 7 cosa + AR A 


x cosh| Cos 7 sin( a +Ap+ *) | > = exp(yC;) 


ar] 0: = 
<a 2 


<eosh) uCi00s ; cosa +p + *)| 


202 
x cosh|pCieoss sin( a +An+ *) > < exp(* ey. 
4 4 (ax |OS0x=7/2) 4 
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Identical bounds can be derived when & is odd. Therefore, we have 


2 
{exp[ud” Crcos(a, + Ax)]) S exp (1 ve” Cet - i ct), 


h 
kEQ, REQY 


where {2; is a subset of [---, —2, —1, 2, 3, ---]. For simplicity, we 
choose {2; to be the null set. 


Choosing optimum p, 


Pr( Re W2 > Ai|a1 — ao = Z| 


= | _ {Ar = [Cocos(ao + Ao) + Cicos(or + a) 
= exp =—— {o" + ” Ci} ’ 


Ay — [Cocos(ao + Ao) + Cicos(ai + Ai)] > 0. 


Similarly, we can show that 
Pr( -Re We > Aili — a0 = Z| 
= e {Ai — [Cocos(ao + Ao) + Cicos(ai + Ai) J}? 
<= exp (+ CH) : 
Ai + [Cocos(ao + Ao) + Cicos(ai + A1)] = 0, 


Pr( (tm Ww; | > As|ar ee c= 7) 


< = {Az — [ Cosin (ao + Xo) + Cisin(a; + wn) 
is ae Sc: | a 


(0? vs y C2) 
As + [Cosin(ao + Ao) + Cisin(a; + A1)] = 0, A?+Aj=A%, 


( {Ao + [Cosin(ao + Ao) + Cisin(ar + a 
ek) eee 


and 
(Ai — Cm)? (Az — Cuz)? 
Pr(|w2| >A) = 2 ep(-S So") 2 exp( Ae" , 
where 
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Cm = max{[Cocos(ao + Ao) + Cicos(ai + A1)]} 


T 
a-a== 


M’ 


Cuz = max{[Cosin(ao + Xo) + Cysin(a; + Ai) }} 


T 
agg: 
o2 =” CH. 


For zero IsI, it can be shown that optimum values of A; and A: are 


A,2=A cos($ +i). 


A, =A sin( 2 + a) , 
Even when there is ISI, we shall use these values of A; and Ao. 
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On the Rapid Initial Convergence of Least- 
Squares Equalizer Adjustment Algorithms 


By M. S. MUELLER 
(Manuscript received May 11, 1981) 


Adjustment algorithms for transversal equalizers derived from 
least-squares cost functions are known to converge extremely fast. 
While various simulation results confirming this fact abound in the 
literature, a theory explaining the fast convergence has been lacking. 
This paper reports on steps toward such a theory. For some commonly 
used start-up data sequences it was found that algebraic properties 
of the sampled signal vectors play a critical role in the transient 
behavior of these algorithms; namely, successive signal vectors are 
linearly independent for a large class of transmission channels in 
the absence of noise. After N iterations (N being the number of taps), 
the resulting coefficient vector is found to be related to well-known 
equalizer coefficient vectors. If a single pulse is used as a training 
signal, the zero forcing equalizer is obtained; if a pseudo random 
noise sequence, with a period in symbols equal to the number of 
coefficients is used, the steady-state solution of the cyclic equalization 
is obtained. Thus, after only N iterations, the least-squares algo- 
rithms yield a coefficient vector which is only asymptotically obtain- 
able by gradient techniques. 


l. INTRODUCTION 


Adaptive equalizers are important building blocks in modems for 
digital data transmission over linear dispersive channels. They adap- 
tively mitigate the adverse effects of intersymbol interference. A 
critical parameter in the start-up performance of modems is the speed 
of convergence of the equalizer adjustment algorithm. The overall data 
throughput depends on it and, consequently, a high convergence speed 
is desirable. 

Various different equalizer structures are known at this time. In the 
following, we concentrate on the frequently used transversal filter 
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structure.’ Many equalizer update algorithms are based on the steepest 
descent, or gradient technique, which minimizes the mean-squared 
error (mse) between the equalizer output and the transmitted data 
symbols.” In particular, the stochastic approximation of the gradient 
algorithm with an mse criterion is used frequently. The convergence 
speed of this algorithm was analyzed in Refs. 3 and 4. It was found to 
be dependent on the number of coefficients used and, to a lesser 
degree, on the eigenvalue spread of the channel autocorrelation matrix. 

Several methods to improve the convergence speed of the gradient 
algorithm were published in Refs. 5 to 8. In Ref. 5, prior knowledge of 
the transmission channel is assumed and a transformation of the 
received signal is proposed which reduces the effect of a large eigen- 
value spread on the convergence speed, whereas in Ref. 6 a transfor- 
mation of the correction vectors, yielding the same performance, is 
proposed. Used in conjunction with a stochastic gradient algorithm, 
these methods reduce the convergence time to the minimal time which 
is obtainable with an ideal channel having an eigenvalue spread of one. 
In Refs. 7 and 8 cyclic equalization was proposed as a means to speed 
up the convergence time. The number of iterations required for con- 
vergence of the algorithm is about the same as for the stochastic 
gradient algorithm, but theoretically, the convergence time can be 
reduced to the time required to fill the register of the equalizer. 
Practically, however, the convergence speed is limited by the available 
computational power of the implemented algorithm. 

In Ref. 9, Godard cast the equalizer adjustment problem as an 
estimation of a stationary state vector in Gaussian noise—a classical 
Kalman filtering problem. This resulted in a new, powerful, and rapidly 
converging equalizer adjustment algorithm. While this algorithm was 
familiar in the area of stochastic approximation theory,’ it was never 
applied to equalizers prior to Godard’s work. It was shown by computer 
simulations,’ that this coefficient adjustment algorithm converges con- 
siderably faster than the stochastic gradient algorithm and virtually 
independently of the channel used. After about N iterations, where N 
is the number of coefficients of the equalizer, the mse of the equalized 
signal is generally close to the minimal obtainable. This is an improve- 
ment by a factor of three to ten”’’"'* compared with the performance 
of the stochastic gradient algorithm. The exact improvement factor 
depends on the channel involved and on the modulation scheme used. 
Godard showed further that under certain modeling assumptions, the 
excess mse converges asymptotically, as the inverse of the number of 
iterations. 

More recently, various methods were published”! that reduce the 
computational complexity in the implementation of the Godard algo- 
rithm. These methods exploit the fact that only one new element, 


2346 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1981 


which may be a vector, is introduced in the vector of the received 
signal at each iteration. They avoid the processing of large matrices 
which are required in the Godard algorithm. Accordingly, the number 
of arithmetic operations can be reduced. For the Godard algorithm 
those grow quadratically, while for the algorithms described in Refs. 
12 to 14 they grow proportionally to the number of equalizer coeffi- 
cients—a considerable reduction for long equalizers. In Refs. 15 and 16 
all these so-called least-squares algorithms are extended to include 
fractionally spaced, complex equalizer structures. The least-squares 
lattice algorithm was also extended to a decision feedback equalizer in 
Ref. 17. Investigations regarding the implementation are reported in 
Ref. 18. 

The objective of this paper is to provide insight, on a fundamental 
level, into the rapid initial convergence of the least-squares algorithms 
relative to the stochastic gradient algorithm, Godard’s approach is 
probabilistic and aimed at minimizing the ensemble mse as rapidly as 
possible. He assumed that all the involved random variables have joint 
Gaussian distributions. For this case, the Kalman filtering technique 
gives the fastest possible convergence. Although these assumptions are 
not generally satisfied in data transmission applications, a very pow- 
erful algorithm emerged. 

In Ref. 19, a general equivalence between Kalman filtering and least- 
squares estimation techniques was exhibited. For the problem at hand, 
it implies that the algorithm obtained, while not optimal in a proba- 
bilistic sense, is the solution of a deterministic least-squares problem. 
This fact was stated for equalizers in Ref. 12. The Godard algorithm, 
as well as the ones proposed in Refs, 12 to 17, minimize the sum of the 
squared equalizer output errors under the condition that the coefficient 
vector remains constant from the start of the session to the current 
time. Note that this particular cost function is not the mse. Therefore, 
it does not explain directly why the actual mse converges so rapidly. 

One obvious reason for the fast convergence of the least-squares 
algorithms is that all the information available from the start of the 
equalizer adaptation is stored and exploited in the update procedure, 
while the stochastic gradient algorithm relies mostly on current infor- 
mation. In Ref. 11, the Godard algorithm was interpreted as a sto- 
chastic gradient algorithm, where the coefficient corrections are trans- 
formed by an estimate of the inverse of the channel autocorrelation 
matrix. The fast initial convergence was attributed there to “self- 
orthogonalization” of the equalizer adjustments. However, this can 
only account for part of the high speed. It cannot explain the fact that 
the least-squares algorithms converge considerably faster than the 
stochastic gradient algorithms under the best conditions, i.e., when an 
ideal channel is involved or, equivalently, when the channel autocor- 
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relation matrix is known exactly. 

Here, we investigate the initial convergence of the deterministic 
least-squares algorithms from an algebraic point of view and offer 
alternative interpretations for the fast initial convergence. In Section 
II, the problem is stated and certain algebraic properties of the com- 
monly utilized pseudo random start-up sequences are derived. In 
Section III, we relate the coefficient vector that results after N itera- 
tions (N being the number of coefficients) to the zero forcing equalizer 
and to the coefficients resulting with cyclic equalization.” The influence 
of the added noise is taken into account in Section IV. 


ll. THE OPTIMAL EQUALIZER COEFFICIENTS 


Let [a(k)] denote the complex data sequence which is transmitted 
at a signaling rate of 1/T over a channel with sampled impulse response 
h(nT) = h,. The samples é(nT) of the received signal, can be expressed 
as 


ioe) 


E(nT) = Y al(k)h|((1+M—k)T| + vo(nT), (1) 


=—o0 


where p(nT) denotes the channel noise, and the equalizer length 
N = 2M + 1 is an odd number. 

Let x(n) denote the complex vector of the past N received signal 
samples 


x(n)" =[E|(T)|,é|(-DTI,---E|[@—-N+VTI], (2) 


and let c(k) denote the complex coefficient vector of the adaptive 
equalizer. Then the equalizer output y(n) at time nT, using the 
coefficient vector c(k), can be written as the scalar product 


y(n) = c(k)*x(n). (3) 


The * denotes conjugate transposed vectors (matrices) or conjugate 
complex scalars. 


2.1 The mean-square criterion 


For the mean-square criterion, the equalizer coefficients are to be 
chosen such that the equalizer output y(n) approximates the trans- 
mitted data value a(n) as closely as possible. The optimum coefficient 
minimizes the mse’” 


e(k) = E[|c(k)*x(n) — a(n)|*], (4) 


where E[-] denotes ensemble averaging over the sequence [a(n)]. The 
vector Cope Which minimizes eq. (4) is given by 


Copt = Av, (5) 
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where 

A = E[x(n)x(n)*] (6) 
is the autocorrelation matrix of the channel, and 

v = E[x(n)a(n)*] (7) 


is the cross-correlation vector between the signal vector and the 
transmitted data value. In eq. (5), Copt is the coefficient vector which, 
on the average, will perform better than any other. Ref. 2 indicates 
that the solution of eq. (5) exists if the absolute square of the transfer 
function and the spectral density of the noise have no zeros. 


2.2 The least-squares criterion 


9,11-16 


The least-squares algorithms minimize the following cost func- 


tion 
2(n) = PH |e(n)*x(k) — a(k)|?’, (8) 


ie., the equalizer coefficient vector c(n) minimizes the sum of error 
squares which resulted if c(n) was used from the beginning of the 
transmission to the present instant nT. Usually, this is performed 
iteratively, i.e., c(m) is calculated recursively for n = 1 --- . Note 
that for n — ©, time invariant channels, ergodic data sequences [a (7) ] 
and noise, the cost function approaches asymptotically the ensemble 
mse, e(7). Also, the solution c(7) converges then to Copt. 

Differentiating eq. (8) with respect to c(n) and setting the derivative 
to zero yields the following set of linear equations for the coefficient 
vector c(n): 


yi x(k)x(k)*ce(n) = ¥ x(k)a(k)*. (9) 
k=1 k=l 
Although the least-squares algorithms do not attempt to minimize the 
mse, it was observed”’'-” that the mse 


e(n — 1) = E[|c(n — 1)*x(n) — a(n) |’], (10) 


which results if the equalizer’s coefficient vectors stemming from the 
previous iteration at (n — 1)T is used, converges roughly when n 
reaches N. 

To find reasons for this interesting property, we examine closely in 
the following, the solution of eq. (9) after exactly N iterations and 
derive relations between the coefficient vector c(N) and some well- 
known equalizer coefficient vectors, namely, the zero forcing and the 
cyclic equalizer. 

In the noiseless case, we show that the vectors x(1), --- x(N) are 
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linearly independent of each other for the data sequences that are 
usually used for equalizer start-up. Under this hypothesis of linearly 
independent signal vectors x(n), it can only be that the linear combi- 
nation 

N 


x(k)b(k) =0, (11) 
k=l 
if b(k) = 0 for all k = 1.--- N. Then, it follows from eqs. (9) and (11) 
that 


x(k)*c(N) = a(k)* k=1,---N. (12) 


This means that c(N) equalizes the first N received signal vectors 
ideally. All errors are zero and accordingly z(N) = 0. In the special 
case where the channel transfer function is of the all-pole type and of 
order N — 1, z(n) will remain zero for n > N and c(n) = c(N). 
Therefore, the optimal coefficient vector is found exactly when n = N. 
Although this is not the case generally, c(N) still has interesting 
properties. 


lll. ALGEBRAIC PROPERTIES OF SIGNAL VECTORS AND PARTICULAR 
EQUALIZER VECTORS WITHOUT NOISE 


Using a notation which is similar to the one used in Ref. 4, it is 
possible to express the N-dimensional vector x (2) in the noiseless case 
as follows: 


x(k) = Bdtk), (13) 
*+ Ao ---Ay «+ +Ay_jees 
where B= wn sec oehutas (14) 
ee eee ee 
and 
d(k)” =[--- ,a(k+ M), ---, a(k), --»,a(k—M), veel, (15) 


In eq. (14), Bis a stationary N X< L matrix, where L is at least the sum 
of the channel memory plus (N — 1). The center part of length 
N = 2M + 1 is shown in eq. (14). In eq. (15), d(k) is a stationary 
L-dimensional vector. . 

The vectors x(1) --- x(N) are linearly independent, if they span the 
N dimensional space. This is equivalent to the N X N matrix 


[x(1) |x(2)| «+++ |x(N)] = B[d()| --- | d(N)] (16) 


having rank N. This can only be true if both matrices on the right- 
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hand side have rank N. This is a necessary but not a sufficient 
condition. Although eq. (16) has a very particular form, namely Toe- 
plitz, it is not easy to find sufficient conditions for linear independence. 
We, therefore, investigate two interesting special cases involving par- 
ticular start-up data sequences; namely, a sequence consisting of just 
a single pulse, and the other a periodic pseudo random noise sequence. 
Before pursuing this line of attack, we need additional facts about the 
matrix B. 

For B to have rank N, the row vectors of B must be linearly 
independent. A necessary and sufficient condition is Gram’s criterion: 
N vectors bi, --- by are linearly independent if and only if 


bi*bi bi*bo-++ bi* bn 
bo* by bo*be+++ b2* bn 


eeceoeceee eevee ee jseeeece 


~ 0. (17) 


We identify 5,* as the first row of B and in general 6,,* as the nth row 
of B. Consequently, the matrix in eq. (17) is the autocorrelation matrix 
of the channel. Gram’s criterion then requires that the autocorrelation 
matrix be nonsingular. This is the same condition as the one required 
for the existence of a solution of eq. (5) in the noiseless case. Thus, 
whenever an optimal coefficient vector (in the mean-square sense) 
exists, then the matrix B has full rank N. 


3.1 Single training pulse 


Consider now the transmission of a single pulse at k = 1 + M. Then, 
inserting eqs. (13) to (15) into eq. (12) yields the following set of 
equations for the equalizer coefficient vector after N iterations: 


0 
ho: + hu-- hy-1 |* : 
h_-m-+ ho-- hm | c(N)=({1\. (18) 
Ai-n hm ho : 

0 


This is precisely the equation which defines the zero forcing equalizer.’ 
In case the peak distortion of the channel impulse response is smaller 
than one, 1.e., 


1 co 
D=— Y |Ae| = 1, (19) 
| Ao| k=—w 
k40 


the resulting equalizer minimizes the peak distortion of the overall 
channel,’ and Gershgorin’s criterion guarantees a unique solution of 
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eq. (18). This is a sufficient but not a necessary condition. 

This example shows that the least-squares equalizer adjustment 
technique yields the zero forcing equalizer in WN iterations [if a unique 
solution of eq. (18) exists], whereas the gradient technique only attains 
this asymptotically in the steady state and practically requires a 
multiple of N iterations to obtain a good approximation. 

As the time instant approaches infinity, the n equations 


x(k)*e(n) =a(k)* and kR=1,---n (20) 


cannot be satisfied simultaneously any more. In this case, c(n) will be 
determined from eq. (9). Inserting eqs. (13) to (15) into eq. (9) yields 


0 

0 
lim BB*c(n) = B}1}. (21) 
n>=oo 0 

0 


Since BB* is proportional to A as defined in eq. (6) and the right-hand 
side of eq. (21) is proportional to v as defined in eq. (7), with the same 
proportionality constant, it follows that 

lim c(n) = Copt; (22) 
i.e., the least-squares algorithms converge in the noiseless case to the 
optimal coefficients in the mse sense. 


3.2 Periodic pseudo random training sequence 


While the technique of sounding the channel through isolated test 
pulses is technically possible, a different method has found wider use. 
In Refs. 7, 8, and 20, periodic pseudo random noise sequences (PRNSS) 
were proposed and analyzed for equalizer training purposes. When 
these sequences are used, the resulting equalizer coefficient vector in 
the steady state is found to be different from the optimal one when 
random data was used. Nevertheless, the former vector approaches 
the optimal solution very closely even for short periods. If a PRNS of 
period length P = N is used, then the vectors of the sampled signal are 
periodic also and may be written as: 


E(k) = Bd(k), (23) 
Fines Fpgrate 

where B=] 10) 1" |, (24) 
hi-n ho 
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hin = ¥ Ante, (25) 


k=—00 
and 
d(k)” = |an+m, se* Ar, eee arm |. (26) 


The rows of B and the vector d (k) have dimension N. 

Analyzing eqs. (24) and (25) reveals that B isa N X N circulant 
matrix. It is well known from Ref. 7 that the eigenvalues of circulant 
matrices are determined by the discrete Fourier transform of the first 
row. Since the first row of the mentioned N x N circulant is formed by 
samples of the periodically repeated channel impulse response, it is 
concluded that B has full rank N, provided that the discrete Fourier 
transform of the periodically repeated channel impulse response has 
no zero values. 

This condition is very similar to the one stated for the existence of 
an optimal coefficient vector in the mse sense,” which, in the noiseless 
case, requires that the absolute value of the channel transfer function 
have no zeros. 

From eq. (26) and the fact that the data sequence is periodic, 
ie, a(k + N) = a(k), it follows that N successive data vectors 
d(k) ---- d(k + N — 1) form an N X N circulant matrix. Arguing as 
above, it follows in general that these vectors are linearly independent 
if the dft of the data sequence has no zero values. 

For pRNSs of period N in particular it is known that 


d(k)*d(k) =N 
d(k)*d(j) = —-1 kA). (27) 


The matrix in Gram’s criterion (17) is then a circulant with eigen- 
values 
A —_ 1 


M=eN+1 jH2-.-N. (28) 


The determinant is (N + 1)‘ ¥ 0; therefore, N successive data vectors 
of a PRNS with period N are always linearly independent. This, together 
with the fact that B has rank N, implies that any N different < vectors 
are linearly independent. Therefore, the coefficient vector after N 
iterations is given as the solution of 


x(k)*c(N) = a(k)* for kR=1,---N (29) 


and is guaranteed to exist. In Ref. 7 it was shown that for the particular 
case of N = P, i.e., when the equalizer length equals the period P of 
the PRNS, the solution of eq. (29) has an interesting interpretation in 
the frequency domain: it equalizes the channel transfer function of the 
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periodically repeated impulse response at N equidistant points. Again, 
this solution is obtained by the least-squares algorithms after N 
iterations, whereas the gradient technique obtains this only asymptot- 
ically as the number of iterations becomes large. 

If the equalizer length is smaller than the period of the PRNS, i.e., 
N < P no specific information on the nature of c(N) can be obtained. 
We, therefore, consider the solution c(P) after P iterations. Inserting 
eq. (23) in eq. (9) and using eq. (27) yields 











—1 
Z Z Pe ec | 
B(I-— D)B*c(P) = B\|P |——, (30) 
speed 
—1 
where D is a matrix containing identical elements 
I 
D;,; = Pat (31) 
Using the fact that all rows of B have the same sum 
N 0 
y Axk= D Ax, 
K=1 K=—-0 
it follows that 
BB*c(P) =i +4, (32) 
where q is a vector with identical elements 
2 
1 fo) N 00 
i= h P)- h 33 
q mo ym] DelP)- 5 hs (33) 
and 
6; = hi. (34) 
Observe that 
RL: 
k=—0 
and 


N 
> «(P)' 
k=l 
equal the dc value of the transfer function of the sampled channel and 
t ¢,(P) denotes the kth element of c(P). 
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of the equalizer, respectively. In the absence of noise and for P = ~, 
their product will be one. Then, according to eq. (33), g; = 0. Therefore, 
q; will be small even for finite P, and c(P) may be approximated as 
follows 


c(P) = (BB*)7 6. (35) 


This result may be interpreted as the optimal solution in the mean- 
square sense for a channel with an impulse response of finite duration 
P which is identical to the periodically repeated impulse response in 
the base interval |-PT/2, PT/2|. If P is large enough to span the 
channel impulse response, then it follows that c(P) is very close to the 
optimal coefficient vector after only P iterations. 


IV. THE INFLUENCE OF NOISE 


Generally, the channel noise is not negligible as assumed in the 
previous section. In the presence of additive noise the vector w(k) of 
the sampled signal can be written as 


w(k) = x(k) + r(k). (36) 


If only one single pulse is transmitted, x(X) is defined in eq. (13) and 
r(k) is the noise vector. In this case, a coefficient vector c,(n) is 
obtained after N iterations, which is the solution of the following 
equation 


0 
0 
|H+ Rica(N) = | 1}, (37) 
0 
0 
where 
ho + +huw ++ Ayn-il* 
H= i ‘ é (38) 
hi-n-+h_m ho 
and 
al Vo esses UN * 
R a ° 2 cece (39) 
Vo-N eevee VP) 
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The difference between this solution and the one given in eq. (18) 
equals 


Ca(N) — c(N) = H'Re.(N). (40) 


If cg(N) is used instead of c(N) to compute the value of the cost 
function defined in eq. (8), we obtain 


ZaA(N) = |ca(N) — e(N)|*H*H|ca(N) — c(N)|. (41) 


Substituting eq. (40) into eq. (41) and evaluating the expected value of 
the cost function yields 


E|za(N)| = E|ca(N)*R*Rea(N)| = ca(N)*E|R*R\ca(N), (42) 


where £|R*R| is N times the correlation matrix of the random noise. 
Using Parseval’s theorem, eq. (42) can be expressed in terms of the 
transfer function Ca(w) of the equalizer and of the power density 
spectrum S,(w) of the noise v(n) 
n/T 
E|za(N)| = TN/2n | | Cu(w) |?S,(w) dw. (43) 


—n/T 


Thus, increase of the cost function is N times the average noise power 
after the equalizer. This means that the average squared error (12) per 
equalized symbol is equal to the noise variance after the equalizer. 

We now examine the solution of eq. (9) for n > N. In this case, the 
equation for the equalizer coefficient vector becomes 


0 
, 0 
B(n)B(n)* + ¥ r(k)r(k)*| c(n) = Bin) | 1], (44) 
k=1 0 
0 


where B(n) is similar to eq. (14) with row length equal to n. This 
indicates that, as n becomes large, the influence of the noise grows 
proportionally. Since B(n)B(n)* converges to the channel correlation 
matrix A = BB*, which is stationary, it can be concluded that for very 
large n the influence of the noise becomes dominant. Therefore, it is 
not advisable to use the sounding technique for more than N to 2N 
iterations. 

If a PRNS with period P symbols is used during start-up and noise is 
present, the coefficient vector after P iterations is not determined 
anymore by eq. (35). Inserting eqs. (36) and (23) into eq. (9) yields 
instead . 
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P 


¥ r(k)r(k)*| c(P) = 6, (45) 


k=1 





ee 1 
BB* + 
Pa 


where it was assumed that cross products of r(k) and x(k) may be 
neglected. 

The matrix of eq. (45) will, in the mean, become the correlation 
matrix of the channel plus the noise. Therefore, the solution will, in 
the mean, be the optimal solution as given by eq. (5). Note that now 
there is no danger in letting the algorithm run for an indefinite time, 
since both terms of the matrix, as well as the right-hand side of the 
equation, grow proportionally. 


V. CONCLUSION 


The initial convergence of least-squares equalizer adjustment algo- 
rithms was analyzed to determine why the least-squares algorithms 
converge so much faster than the widely used stochastic gradient 
algorithms. The algebraic properties of the sampled signal vectors 
were found to be of crucial importance for the convergence behavior. 
In particular, it was found that, for a wide class of transmission 
channels and for commonly used data sequences, successive sampled 
signal vectors are linearly independent. This ensures a unique equalizer 
coefficient vector after exactly N iterations, where N is the dimension 
of the equalizer. In the noiseless case, this coefficient vector was found 
to correspond to particular equalizer coefficients which were reported 
and studied earlier. If a single pulse is transmitted, the zero forcing 
equalizer is obtained. If a pseudo random noise sequence with a period 
in symbols equal to the number of equalizer coefficients is used, the 
steady state solution of the cyclic equalization technique results. This 
explains why the least-squares adjustment algorithms converge much 
faster than the gradient techniques: the above-mentioned particular 
equalizer coefficients are obtained after only N iterations, whereas 
with the stochastic gradient techniques they are only approximated as 
the number of iterations becomes very large. The influence of the 
inevitable channel and measurement noises was evaluated. Approxi- 
mations show that similar performance, as in the noiseless case, is 
obtainable. 
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We consider the behavior of a general type of system governed by 
an input-output operator G that maps each excitation x into a 
corresponding response r. Here excitations and responses are R”- 
valued functions defined on a set T. To accommodate both continuous 
time and discrete time cases, T is allowed to be either [0, ~) or 
{0, 1, 2, «--}. We address the following question. Under what condi- 
tions on G and x is it true that the response r is L-asymptotically 
periodic in the sense that r = p + q, where p is periodic with a given 
period 7, and q has finite energy (t.e., is square summable)? This type 
of question arises naturally in many applications. The main results 
given (which include a necessary and sufficient condition) are basi- 
cally “tool theorems.” To illustrate how they can be used, an example 
is discussed involving an integral equation that is often encountered 
in the theory of feedback systems. 


l. INTRODUCTION 


In this paper we consider the behavior of a general type of system 
governed by an input-output operator G that maps each excitation x 
into a corresponding response 7. Here excitations and responses are 
R”-valued functions defined on a set T. (As usual, 7 is an arbitrary 
positive integer.) To accommodate both continuous time and discrete 
time cases, we allow T to be either [0, ©) or {0, 1, 2, ---}. As in Lo- 
stability theory,'® each x is drawn from a family E(L) of functions 
whose truncations belong to a set L of finite-energy (i.e., square 
summable) functions (the details are given in Section 3.1), and G is 
assumed to map EF (L) into E(L). 

We address, and give in Section 3.2 results concerning, the following 
question. Under what conditions on G and x is it true that the response 
r is L-asymptotically periodic in the sense that r = p + q, where p is 
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periodic with a given period 7, and g has finite energy?+ This type of 
question arises naturally in many applications. Our results are basically 
“tool theorems” which appear to be widely applicable. An example is 
given in Section 3.4. 


ll. MOTIVATION AND BACKGROUND MATERIAL 


To provide motivation for considering an abstract input-output 
operator G, and also to describe some earlier results related to those 
of Section III, we begin by recalling that an important example of a 
type of equation that arises in the study of physical systems (such as 
feedback systems or networks containing linear lumped and/or distrib- 
uted elements, as well as memoryless, possibly time-varying, nonlinear 
elements) is the integral equation 

t 
x(t) =r(t) + | k(t — o)¥[r(o), aldo, t=0 (1) 


0 


in which x and r take values in R” (whose elements we take to be 
column vectors), & is an n X n matrix-valued function, and Y maps 
R” x [0, 0) into R”. In eq. (1), typically x takes into account initial 
conditions as well as inputs, and r is the output (i.e., is the intermediate 
or final output) corresponding to x (see, for instance, Ref. 2, pp. 872-4 
for a specific application). Discrete-time counterparts of eq. (1) (see 
Ref. 7, pp. 449-51, for example) also arise often in system studies. 

Much is known about the properties of eq. (1), e.g., see Refs. 2, 8, 
and 9. In particular, if nm = 1 and the “circle criterion” of Ref. 2, 
together with certain associated conditions concerning k, y, x, and r 
described in the reference, are met; if f(-, £) is periodic in ¢ with some 
period 7; and if x = x, + xo with x, bounded and 1-periodic and with xo 
bounded and such that xo(t) — 0 as t > «, then we have r= p+q in 
which p is periodic with period 7, and g(t) — 0 as t > (see Ref. 2, 
Theorem 4). 

This result generalized to arbitrary n is proved in Ref. 6 by first 
showing that there is a t-periodic p defined on (—%, ©) such that, with 
x1 extended periodically for negative values of t, the auxiliary equation 


xi(t) = p(t) + | k(t—0)y[p(c),c]do, t>—-@ (2) 


{In contrast, it is standard (see, for example, Ref. 6, p. 195) to mean by r is 
asymptotically periodic that r = p + q with p continuous and as indicated, and with q 
continuous and such that its values go to zero as time approaches infinity. It is often 
possible to show without much difficulty that r is asymptotically periodic if r is L- 
asymptotically periodic and some natural additional hypotheses are met (for an example, 
see Section 3.4). 

+ The earlier related circle criteria in Refs. 10 and 11 address different issues. 
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is satisfied. Then, using eq. (1) and the fact that eq. (2) gives 


xi(¢) -| k(t — 0)¥[ p(o), aldo 


= p(t) + | k(t—o)¥[p(o), o]do, t20, 


it is proved that when x = x; + xX, we have r(t) — p(t) ~ @ast— ™, 
in which @ is the zero element of R”. A similar proof shows that if 
n = 1 (for the sake of simplifying a statement of a result) and both xo 
and s, where s(t) = ff | k(o) | do for {= 0, have finite energy, then under 
the conditions indicated above, r — p has finite energy [see Ref. 2, 
Corollary 1(a)]. The proofs in Ref. 2 are of a functional analytic nature. 
For material related in a general sense concerning systems of differ- 
ential equations, and in which a Lyapunov-function approach is used, 
see Ref. 12, pp. 210-23. Concerning more recent material, a result along 
the same lines as the one described in the preceding paragraph for 
interconnected systems® governed by a somewhat different class of 
integral equations, is proved in Ref. 13. There, too, an auxiliary- 
equation approach is used.} . 

Under reasonable conditions on k and w (see Section 3.4), the set 
E(L), previously described (and defined in Section III), contains ex- 
actly one solution r of eq. (1) for each x € E(L). We now introduce the 
typically trivial restriction that only solutions r of eq. (1) contained in 
E(L) are of interest to us. Thus, under reasonable conditions, there is 
associated in a natural way with eq. (1) a map G:E(L) — E(L) such 
that r = Gx for each x € E(L). Of course, many other examples can be 
given in which such a map G arises. 

Assuming that (-, o) in eq. (1) is independent of o and that 
(6, 0) = 8, notice that the G associated with eq. (1) has the property 
that it is time invariant in the usual sense that the response to a 
delayed input is the delayed response to the original input. (For a 
precise definition of time invariance, see Section 3.3.) This type of 
property of G, rather than the concept of an auxiliary equation, plays 
a central role in our approach in Section II]. 


lil. L-ASYMPTOTIC PERIODICITY, TIME INVARIANCE, AND 
PERIODICALLY-VARYING SYSTEMS 


3.1 Preliminary notation and definitions 


Throughout the remainder of the paper the following notation and 
definitions are used. 


+ The method used in Ref. 13 to show the existence of a periodic solution of the 
auxiliary equation is very different from that in Ref. 2. 
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The symbol T denotes either [0, ©) or {0, 1, 2, ---}. Elements of R” 
are taken to be column vectors, v’ denotes the transpose of an arbitrary 
v € R", and @ stands for the zero element of R”.£ 

If T = [0, ~), then Z denotes the set of Lebesgue measurable 
functions v from T into R” such that 


i v'(t)u(t)dt < ©, 
0 


Alternatively, when T = {0, 1, 2, ---}, L stands for the set of maps v 
from T into R” such that 


y v’(t)v(t) < ©. 
t=0 


The norm ||v|| of an arbitrary element vu of L is defined by 


oo 1/2 
|| v|| = (| viola) if T= [0, ~), 
0 
and 
bs 1/2 
lvl] = (x “oot if T= {0, 1, 2, veh 
t=0 


With this norm, LZ is a Banach space of finite energy (i.e., square 
summable) functions. 

For v:T — R” and w € T, UW.) denotes the map from T into R” 
defined by w.)(t) = v(t) for t € T with t S a, and w(t) = 8 for tE T 
such that t > w. We use E(L) to denote the “extended set” 
{u:T > R” |v) € L for w € T}, and Oz stands for the zero element of 
E(L). [Note that E(L) is the set of all maps v:T — R” when T = {0, 
1,2, s0«},J 

We say that a map H:E(L) — E(L) is causal (see Ref. 2, p. 888) if 
we have (Hv)... = [Hw ];.) for each v € E(L) and each w € T. 

For any v € E(L) and each 7 € T, v(- + 7) denotes the element w 
of E(L) defined by w(t) = v(t + 70), t € T. 

The symbol 7 denotes a fixed positive element of T, and P stands for 
the set of periodic functions {v € E(L)|v(é + 7) = v(t) fort € T}. 

A central role is played by the set S defined by S = {v € L|there is 
a uv* € L with the property that 

K 


yi u(. + kr) > v* 
k=1 


as K — o}, where — v* means convergence in norm to u*. 


+ We have repeated the definitions of T and @ for the reader’s convenience. 


2362 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1981 


Finally, for each w € T, the “delay map” D,:E(L) — E(L) is defined 
by (D.v)(t) = u(t — w) for t = w, and (D,,v) (t) = @ for t < w. 


3.2 L-Asymptotic periodicity 

We shall use the following hypothesis: 

H.1: Gis a map from E(L) into E(L) such that for any v € E(L), we 
have (GD,v)(t) = (D,Gv)(t) for t € T with t = r. 

This hypothesis is satisfied whenever G is a causal map of E(L) into 
itself that is either time invariant or periodically varying with period 
tT (see Section 3.3). Our main result is the following: 

Theorem 1: Assume that H.1 holds. Let x € E(L), and let r denote 
Gx. Then r has the form p + q with p € P and q € L if and only if 
(Gx — GD,x) € S. 

Proof: Suppose first that (Gx — GD,x) = uv for some vu € S. Let v* € 
L be such that 


K 
¥. u(. + kt) = v* 


k=l 
as K — o, We shall use the proposition that 

v*(- +7) +u(- +7) =v*(-), (3) 
which follows from the inequality 


lv%(- +7) + u(- + 7) — v*()| 


K K 
<|ju*(- +7) — ¥ vo(- + Rr) +] Y vl. + kt) — v*(-) |] (4) 


























for K = 2, and the fact that the right side of inequality (4) approaches 
‘zero as K— o, 

Let po denote r + v*, which is clearly an element of E(L). Since 
r(t) = (GD,x)(t) + v(t) for t € T, by H.1, we have r(t) = r(t — 7) + v(t) 
for t = r. Therefore, for t € T, p(t + 7) = r(t + 7) + v*(t¢+7) = r(t) 
+ v(é +7) + v*(¢ + 7). On the other hand, using eq. (8), r(t) + v(t +7) 
+ v*(t + 7) = r(t) + v*(t) = p(t) for all t € T if T = 
{0,1,2,---}, and for almost all t € T if T = [0, ~). Therefore, 
with p the element of P defined by p(t) = po(t) for ¢ € [0, 7) N T, 
we have po(t) — p(t) = 6 for all t € T if T = {0, 1, 2, ---} and for al- 
most all ¢ € T if T=[0, ©), and clearly r = p + (po — p — v*) in 
which (po — p — v*) EL. 

Suppose now that r = p + q with p € P and q € L, and let u = 
(Gx — GD,x). For t = 7, u(t) = r(t) — r(t — 7) = p(t) + g(t) - 
p(t — 7) — g(t — r) = g(t) — g(t — t) which, together with u € E(L), 
shows that u € L. 

Let u(-) in L be defined by 
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K 
u(t) = ¥ u(t + kr) 
k=l 
for t € T and any positive integer K, and let J be an integer such that 
J > K. Using u(t + kr) = q(t + kr) — g[t + (Rk — 1)r] for k = 1 and 
t € T, we have, for ¢ € T, 


J K 
u(t) —u®(t) = ¥ u(t + kt) — ¥ ult + kr) 
k=1 k=1 
J 
= YY u(t+ kr) 
k=(K+1) 


= g(t + Jr) — q(t + Kz). 


Thus, || uw” — uv || < ||q(- + Jz) || + l|q(- + Kz) ||. Since ||q(- + Kz) || 
—+ 0 as K > o, {uf C L is a Cauchy sequence, and, by the 
completeness of L, there is a u* € L such that ||u™ — u*|| > 0 as 
K— o, This concludes the proof. 


3.2.1 Comments 


The following example shows that S is a proper subset of L. Let 
G be the identity operator on E(L), take n = 1, and let x € E(L) 
be defined by x(t) = In 2 for t € [0, 2] NM T, and x(t) = In ¢ for 

E (2, 0) N T. Let r = 1. Then, (Gx — GD,x)(t) = r(t) — r(t -— 1) = 
In[t(¢ — 1)~*] for ¢ € [8, ©) NM T. Using the inequality n(1 + o) So 
valid for o = 0, we see that In{t(¢ — 1)~"] S (¢ — 1) for t E [3, ~) N T, 
and therefore that v, defined by v(t) = (Gx — GD,x)(t) for t € T, 
belongs to L. Since here Gx cannot be written as p + q with p € P and 
q EL, it follows from the theorem that v € S. 

It is not difficult to verify that the proof given of the theorem can be 
modified to show that H.1 can be replaced with the following somewhat 
weaker hypothesis. 

H.1': G:E(L) — E(L) is a map such that for any v € E(ZL), there is an 
s € S such that (GD,v)(t) = (D,Gv)(t) + s(t) for t € TN [1, ©). 

The simple example: n = 1, (Guv)(t) = v(t) + e~ for ¢ € T and each 

v € E(L) is one for which H.1’, but not H.1, is met. 


3.2.2 Corollaries (the use of weighting functions) 


In this section, and in the Appendix, w denotes any function from T 
into R' such that there is a constant B > 0 for which w(t) = (1 + Bt)” 
when ¢ € T, and such that w is measurable on JT and bounded on 
bounded subsets of T if T = [0, ©). By wv, where v € E(L), we mean 
the element of E(L) defined by (wv) (t) = w(t)u(¢) for ¢ € T. 


Corollary 1: Suppose that H.1 is met, that x © E(L), and that 


2364 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1981 


w(Gx — GD,x) € L. Then Gx = p + q for some p € P and some 
gEL. 

Proof: Let h = w(Gx — GD,x), and let s denote (Gx — GD,x). Observe 
that s € L. For any positive integers J and K with J > K, 


























J K Bs 
Y s(- + kr) — ¥ s(- +k) =|] YM s(- + kz) 
k=1 k=1 k=K+1 
J 
= ¥ |s(- + kr) 
h=K+1 
J 
= } (t+ kBr)? || h(- + kr)|| 
k=K+1 
J 
< } (1 +kBr)* |All, 
k=K+1 
which shows that 
Z K 
¥ s(- + kr) — ¥ s(- + kr) |} — 0 
k=1 k=1 














as J and K approach infinity. By the completeness of L, we have 
s € S and the corollary follows. 

In Corollary 2, below, w(- + 7)[x(- + 7) — x(-)] denotes the element 
of E(L) whose values are w(t + 7)[x(t + 7) — x(t)]. 
Corollary 2: Assume that H.1 is met, and that there is a positive 
constant p such that 


[| w(Gu — Gv) (|| = pw — v) wl (5) 


for u and vu in E(L) and w € T. If x € E(L) ts such that w(- + 7) 
-[x(- + 7) — x(-)] © L, then Gx = p + q for some p € P and some 
gqEL. 

Proof: We have || w(Gx — GD,x),.)|| = p|| w(x — D,x)( || for w & T and 
any x © E(L). When w(- + 7)[x(- + 7) — x(-)] € L, it follows that 
sup.er|| w(x — D,xX) («|| < %; hence, w(Gx — GD,x) € L. By Corollary 
1, Gx = p + q with p and q as indicated. 


3.2.3 Comments 


The condition that w(- + 7)[x(- + 7) — x(-)] © L is met if 
~ suprer[w(t + 7)/w(t)] < © and x = po + go with po € P and wq@ € L, 
and of course supre7[w(t + 7)/w(t)] < © is satisfied if, for example, 
w(t) =e™ for t € T or w(t) = (1 + At)® for t € T, with d a positive 
constant. Input-output stability theory techniques can frequently be 
used to show, in specific cases, that eq. (5), with an appropriate w, is 
met. 

Regarding the case in which T = {0, 1, 2, ---}, since t could have 
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been taken to be unity, Theorem 1 and Corollaries 1 and 2 provide 
conditions under which r is L-asymptotically constant in the sense 
that r=c+qwithg € L andc€ C, where C is the set of constant R”- 
valued functions {v € P|v(t) = u for t € T and some vE R"}. 
Corresponding results for T = [0, ©) are given in the Appendix. 

3.3 Time invariance and periodically-varying systems 


Hypothesis 1 plays a prominent role in Section 3.2. Here we give 
definitions which make precise the essentially self-evident proposition 
that H.1 is met if G is a causal map of E(L) into itself that is either, in 
the usual sense, time invariant or periodically varying with period r. 

Let H be an arbitrary causal map of E(L) into E(L). 


Definition 1: H is time invariant if (i) there is an element v of R” such 
that (H6@z)(t) = v for ¢t € T, and (ii) for any x € E(L), we have 


(HD,,.x)(t) = p, tE[0,0) NT 
= (D,,Hx)(é), tE[w,o) AT 
for each w € (T — {0}). 


Definition 2: H is periodically varying with period 7 if (t) H@z = v for 
some v € P, and (it) for each x € E(L) and any positive integer k, 


(HD;,x)(t) = v(t), tE[0,k7T)N T 
= (D,,Hx)(t), tE[kr, eo) NT. 
Notice that H is “periodically varying” with period 7+ if H is time 
invariant. A related definition is the following: 


Definition 2': H is periodically varying with period 7 if (t) H@z = v for 
some pv € P, and (ii) for any x € E(L), we have 


(HD,x)(t) = v(t), tE[0,7) NT 
= (D,Hx)(t), t€[7r, ©) NT. 


To see that Definitions 2 and 2’ are consistent, we observe the 
following: If H meets the conditions of Definition 2, then obviously H 
satisfies the conditions of Definition 2’. On the other hand, if H meets 
the conditions of Definition 2’, and x € E(L) is given, and if 


(HD;,x)(t) = v(t), tE [0,kr) NT (6a) 
= (DrrHx)(t),  t€[kt,0) NT (6b) 


for some k, then, by the conditions of Definition 2’ with x replaced 
with D;,x, 


(HD 41)7x)(t) = v(t), tE [0, )yNT 
=(D,HDr-x)(t), t€[r,o) AT. 
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Since HD,,x has the values given by eqs. (6a) and (6b), we see that 
(HD e+1)7x) (t) = v(t), tE[0, (k+1)7) NT 
(Da+iyr-Hx)(t), t€ [(k + 1)7, ©) 9 T, 


I 


which shows that the conditions of Definition 2 are met. 

Notice that our assumption that H is causal is not explicitly used. 
That assumption restricts the class of operators H so that the defini- 
tions given above are appropriate and natural.+ 


3.4 An example 
Let T = [0, ~), and consider eq. (1) which is repeated below. 


x(t) = r(t) + k(t — o)¥[r(o), aldo, t=0. (1) 
0 


Assume the following, in which L, denotes the set of functions from 
[0, ©) to R' that are summable over [0, ~). 


A.1l: x € E(L), k is a measurable real n X n matrix-valued function 
defined on [0, ©) such that each &;; is bounded and belongs to Li, and 
y is a map from R” X [0, ~) into R” with the properties that 
(4, o) = 6 for o = 0, and 

(i) there is a constant c > 0 such that |¥(u, t) — Y(v, t)| S 
c|u — v| for all u, v € R” and all t = 0, in which |-| is some norm on 
R", and 

(ii) U[z(-),; -] is measurable on [0, ©) whenever z € E(L). 

Since x € E(L) and each k;; € Ih, it follows that u defined by 


u(t) = | k(t — o)¥[x(o), aldo, t=>0 
0 


is an element of E(L). Also, since each k;; is bounded, there is a 
constant Co such that | k(t — o)[W(21, «) — W(z22, o)]| = co| 21 — Z2| for all 
nonnegative ¢ and o such that ¢ = o, and for all z; and 22 in R”. These 
two observations show that a proof given by Tricomi (see Ref. 8, pp. 
42-7) can be modified to prove that E(L) contains a unique solution r 
of eq. (1).¢ 

Let G be the map of E(L) into E(L) defined by the condition that 
for each x € E(L), r = Gx is the solution in E(L) of eq. (1). Since 


+ Although the concepts involved are obviously well known, it appears that Defini- 
tions 2 and 2’ have not actually been given earlier. Also, Definition 1 is not entirely 
standard. For example, in Ref. 4, p. 20, time invariance requires that v = 0. 

+ The integral on the right side of eq. (1) can easily be shown to be an element of R” 
for each ¢ whenever r € E(L). Since the value of the integral for a given ¢ is unchanged 
if r is replaced by any element of E(L) that agrees with r almost everywhere, eq. (1) has 
a solution if there is an element E(L) that satisfies the equation almost everywhere, 
and, moreover, any solution r € E(L) is unique and not merely essentially unique. 
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w(O, t) = @ for t = 0, it is easy to see that H.1 is met when 
W(z,t+ 7) = (2, t) for t= 0 and z€ R”. 
Now consider four additional assumptions. 
A.2: W(z, t) = U(z, t+ 7) fort =0 and allzeE R”. 
A.3: For any 2a and z, in E(L), there is a measurable real n X n 
matrix-valued function D defined on [0, ©) such that (z) each D,; is 
bounded on [0, ©), (iz) Y[za(t), #] — Y[z.(t), t] = D(H[za(t) — 22(t)] for 
t = 0, and (zzz) the relation 
t 
Xo(t) = ro(t) + | k(t — o)D(a)ro(a)do, t=0 (7) 
0 
implies that we have ro € L whenever ro € E(L) and xo € L. (See Ref. 
2, pp. 876-8 for conditions under which A.3 holds when w has a certain 
important specific form.) 
A.4: For each i and J, t?ki; € L, for p = 1, 2. 
A.5: Concerning eq. (1), x = uw, + Ue with w € P and t?u2 € L for 
p = 0, 1, 2. 
We shall prove the following. 
Theorem 2: If A.1 through A.5 hold, then E(L) contains a unique 
solution r of eq. (1), and we have r = p + q for some p € P and some 
qEL. 
Proof: As indicated earlier, A.1 implies that there is a unique solution 
of eq. (1) in E(L). Let r and s denote Gx and GD,x, respectively, and 
let D satisfy Y[r(t), t] — ¥[s(6), t] = D(t)[r(t) — s(t)], t= 0 with D such 
that (z) and (zit) of A.3 hold. Then, with A = r—s andv =x — D,x, 
t 
u(t) = A(t) + | k(t — o)D(c)A(c)do, t= 0. 
0 
Note that v(t) = x(t) for ¢ € [0, 7), and v(t) = ue(t) — ue(t — 7) for t= 
t, from which it easily follows that (1 + ¢)?uv € L for p = 0, 1, 2. 
By A.3, A € L. In addition, observe that we have 


(1 + ¢)u(t) = (1 + f)A(Z) 


+ i k(t — 0) D(a) (1 + o)A(a)do 
0 


+ | (t — o)k(t — 0) D(c)A(c)do, t= 0. 
0 


Since ¢k;; € L; for all i andj, I; given by 
+ By t?k:; we mean, of course, the map from [0, ©) into R' whose value at ¢ is ¢?;;(t). 
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I(t) = | (t — o)k(t — o)D(o)A(o)dr, t=0 


belongs to L. Thus, by A.3, (1 + ¢)A € L. Similarly, 
t 
(1 + ¢)?v(t) = (1 + t)?A(t) + i k(t — «)D(o)(1 + «)*A(o)do 


0 


+2 | (t — o)k(t — o) D(a) (1 +’ ao )A(a)do 


+ | (t — o)*k(t — 6) D(o)A(a)do, t= 0, 
0 


together with the hypothesis that A.3 holds and that t?k;; € L; for all 
i and j, shows that (1 + t)?A € L. By Corollary 1, r= p + q with p and 
q as indicated. 


3.4.1 Comments 


Under the conditions of Theorem 2, it can be shown that the integral 
on the right side of eq. (1) depends continuously on ¢ for ¢ > 0. Thus, 
if u; and u, of Theorem 2 are continuous, then so is r. 

Concerning the standard concept of asymptotic periodicity (see Ref. 
6, p. 195 and refer to the footnote in Section I), arguments of the kind 
used in Ref. 14 show that r is asymptotically 7-periodic whenever x is 
asymptotically r-periodic, the conditions of Theorem 2 are met, and k 
satisfies the additional assumption: 

A.6: Each tkj; is bounded on [0, ~).+ 

More specifically, let A.6 and the conditions of Theorem 2 be met, 
and let p and g be as described in Theorem 2. Then, with y[ p(o), o] 
defined on (—%, 0) by periodically extending y[ p(o), o] on [0, 7), the 
integral 


| k(t — o)¥[ p(o), o]do 


exists as an element of R” for each ¢ (see Ref. 14, pp. 2852-3). This 
integral is periodic in ¢, and it can be shown to be continuous in ¢. 
These facts can be used to verify that when A.1 through A.6 are 
satisfied, 


| k(t — 0)¥[ p(o) + g(a), aldo, 
0 


+ This hypothesis and A.5 imply that each (1 + ¢)&;; is square summable over [0, &). 
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which is continuous in ¢ for ¢ > 0, can be written as v; + Ue with v; 
_continuous and 1-periodic and with v.(t) ~ @ as t > o. The rest is 
obvious. 

The proof of Theorem 2 involves the use of a quadratic weighting 
function w. A similar result can be proved using an exponential 
weighting function. Specifically, suppose that A.1 holds, that there is 
an a > 0 such that A.3 is met with k replaced with e“‘k, and that A.5 
holds with the integrability conditions on uz replaced by the require- 
ment that e“‘u2 € L. Then, since 

t 
e“v(t) = e“A(t) + i e** R(t — o)D(a)e*’A(o)do, t=0 


0 
we have e“A € L. 


IV. APPENDIX 


Throughout this appendix, 6 denotes an arbitrary positive constant, 

C stands for the subset of # (L) whose elements are constant R”-valued 
functions, T = [0, ~), P(w) denotes the set of periodic functions 
{vu € E(L)|v(t + w) = v(t) for t € T} for each w > 0, and S(w) is de- 
fined by S(w) = {v € L|there is a v* € L with the property that 

K 

yi u(. +kw) > v* 

k=1 


as K — o} for any w > 0. 

Consider hypothesis H.2 below. 

H.2: T = [0, ©), and G is a map of E(L) into E(L) with the following 
property: For each v € E(L) and each w € (0, 5), we have 
(GD,,v)(t) = (D,,Gv) (t) for t = w. 

Theorem 3: Let H.2 hold, and let x € E(L). Then Gx has the form 
c+quwithc € Candq EL if and only if (Gx — GD,x) € S(w) for 
w E (0, 8). 

Proof: By Theorem 1, (Gx — GD,,x) € S(7o) for any 7> € (0, 6) when 
Gx has the form indicated. 

On the other hand, suppose that (Gx — GD,.,x) € S(w) for w € (0, 8), 
and let vo € (0, 6). By Theorem 1, Gx = p,, + ¢,, withp,, € P(to) and 
q:, © L. Similarly, for any integer m > 0, and with 7; = 70/m, we have 
Gx = p,, + q,, for some p,, © P(t:) and some q,, € L. Notice thatp,, 
and, therefore, (p,, — P-,) belong to P(7o), and hence have Fourier 
series expansions. Since (p,, — p;,) also belongs to L, and m > 0 is 
arbitrary, it easily follows that there is a u € R” such that p,,(t) = u 
for almost all ¢ => 0. This completes the proof. 

+ Both quadratic and exponential weighting functions have been used earlier for the 


different purpose of obtaining criteria for the boundedness of solutions of equations (see 
Refs. 2 and 7, and, for example, Refs. 5 and 6). 
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Theorem 3 and the material in Section 3.2.2 can be used to imme- 
diately obtain the following two results. 


Corollary 3: Assume that H.2 is met, that x € E(L), and that 
w(Gx — GD,x) € L for w € (0, 8) (w is defined in Section 3.2.2). Then 
Gx=c+quwithcECandgeL. 


Corollary 4: Suppose that H.2 is satisfied, that w (see Section 3.2.2) 
satisfies supizo[w(t + w)/w(t)] < % for w € (0, 5), that there is a 
constant p > 0 such that || w(Gu — Gv).)|| S p||w(u — v)«)|| for u and 
v in E(L) and w > 0, and that x = co + qo with co € C and wq E L. 
Then Gx = c + q for some c € C and some q € L. 
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We study a time-compression (or expansion) technique for possible 
application in communication signal processing, e.g., broadcast- 
quality TV transmission through satellites. The method uses a linear 
chirp, a linear dispersive filter realized by surface acoustic wave 
devices and an envelope detector. This technique is heuristic and can 
be viewed as a quasistationary model of the FM wave involved. 
Numerical results show that excessive distortion is created, and its 
application to TV transmission is not suitable unless some kind of 
equalization is provided. One such form of equalization is the chirp 
transform processor which involves considerably more complexity. 
Simpler equalizations may be possible but do not seem to be straight- 
forward. 


l. INTRODUCTION 


We study a time-compression technique motivated by the long- 
standing interest in transmitting multiple broadcast-quality color Tv 
signals through a single satellite transponder, i.e., a usable RF band- 
width of 36 MHz in a communications satellite such as COMSTAR. 
This can be done by the use of frequency division multiplexing (FDM). 
However, the nonlinearity of the transponder can cause serious intel- 
ligible crosstalk and intermodulation interference between the FM 
carriers unless the satellite power amplifier is backed off substantially. 
Such a backoff, in turn, leads to a reduction in the downlink carrier-to- 
noise ratio. As a result, there exists an optimum trade-off between the 
crosstalk and s/n’s, which limits the overall system performance, and 
achieving broadcast-quality Tv transmission becomes difficult. 

It is possible to time compress each scan line of a color TV signal by 
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the use of a linear chirp, a linear dispersive filter (LDF) and an envelope 
detector. Two or more time-compressed scan lines from different, but 
synchronized, TV signals can then be time multiplexed together in the 
time duration of an ordinary TV scan line. This concept of time- 
compression multiplexing (TcM) is not new,’” but recent advances in 
fast analog-to-digital converters, digital-to-analog converters and 
charge-coupled devices have greatly facilitated the implementation of 
time compression or expansion. However, because of their present 
limitations on bandwidth and speed, time-compression factors for 
achievable Tv signals are only 2 or 3. For large time compressions, 
LDFs realizable by surface acoustic wave (SAW) technology are prom- 
ising candidates because of their large time-bandwidth property. In 
addition to high speed, the Tv application also requires extremely high 
signal fidelity. There are other applications on the high-speed time 
expansion (or compression) of waveforms where the distortion require- 
ment is less stringent than the Tv transmission case. In the specific 
case of multiple Tv transmissions through a single satellite transponder, 
there are many advantages in using TcM, e.g., higher transponder 
efficiency, no intermodulation, no crosstalk, possible compatibility 
with time division multiplex (TDM) operations, etc. The crucial ques- 
tion is, of course, how much distortion the compression/expansion 
process would introduce on the signals. This paper gives both analysis 
and numerical examples that illustrate the method. 

The study revealed that considerable distortion is introduced by 
these operations, and its application to broadcast-quality TV transmis- 
sion would require SAw filter performance beyond the present state of 
the art. However, if the distortion requirement can be relaxed some- 
what, then the present approach is advantageous because of its sim- 
plicity. On the other hand, if high signal fidelity is required, then some 
kind of equalization is needed for the present technique. This has 
motivated the study of an extension of the present method which is 
capable of producing high signal quality with saw filter requirements 
within the present state of the art even at compression factors of 10 or 
more, but at the expense of higher complexity. This latter development 
is not discussed here but is covered in Ref. 3. The remainder of the 
paper covers the theoretical analysis and computer simulation. How- 
ever, the discussion of either subject by itself is not adequate for the 
complete understanding of the system. The theoretical analysis estab- 
lishes that although the basic concept was derived heuristically 
through physical interpretations, it can be viewed as a quasi-stationary 
approximation to the time-compression process. The computer simu- 
lation, on the other hand, provides the quantitative results that lead 
to the conclusion that the resulting distortion is excessive for today’s 
sAw filter parameters. 
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Il. THEORETICAL ANALYSIS 


In this section, we describe and analyze the proposed compression 
method using TV as an example. We first describe a heuristic argument 
of how the technique is supposed to work. We then derive the impulse 
response of a general L>F—the understanding of which is important to 
the subsequent analysis and simulation. A brief step-by-step analysis 
of the compression process is shown, and its result reveals that the 
technique can be interpreted as a quasi-stationary approximation of 
the chirp signal. The mathematical expressions describing the time 
compressor are complicated; thus, numerical results are obtained using 
a computer simulation discussed in Section III. Various other proper- 
ties are also discussed. 


2.1 A physical interpretation 


A block diagram is shown in Fig. 1. The input signal v(t) consists of 
successive scan lines, each with a duration of JT; seconds, and the 
voltage is biased to be positive. It is multiplied synchronously by a 
periodic chirp signal c(t) with a center frequency fo and a chirp range 
of Af.. The instantaneous frequency of c(t) sweeps linearly from 
(fo — Afc/2) to (fo + Af-/2) over each scan line duration T;. The lowest 
frequency of the chirp signal is assumed to be much greater than the 
highest frequency in the Tv signal. The input x(¢) to the LDF is then 
an amplitude-modulated chirp waveform. For simplicity, let us restrict 
our attention to a single scan line, say (0, T;). In this interval, x(t) 
chirps from (fo — Af./2) to (fo + Af./2) with the TV signal as the 
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Fig. 1—Time-compression filter. 
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envelope modulation. As for the LDF, we assume that it has a band- 
width Af centered at fo, and Af= Af.. 

Again, for the purpose of a simple illustration, let us assume that 
Af = Af. and, over its passband, the LpF has a constant gain and a 
linear group delay characteristic decreasing from Ar to zero, where Ar 
is the delay dispersion of the LpF. At the time instant ¢ = 0, x(t) has 
an instantaneous frequency (fo — Af-/2) and an envelope magnitude 
proportional to v(0). This “envelope piece” is delayed by Ar as it 
transits through the LDF. Similarly, at ¢ = T), the instantaneous 
frequency of x(t) is the highest chirp frequency which gives a zero 
delay for the passage through the Lpr. In between these two end 
points, the envelope of x(t) is delayed linearly from Ar to zero. 
Equivalently, the envelope of x(t) over the interval (0, 7;) is time 
compressed into (Art, 77) in the LDF output. An envelope detector is 
then used to retrieve the time-compressed TV signal. Similar compres- 
sion occurs for all the other scan lines, and as a result, the envelope 
detector output consists of a sequence of compressed Tv bursts. If we 
denote the duration of each burst by T., the ratio (T)/T-) is called the 
time-compression ratio (TCR). 

It is easy to show that, in general, for a given set of T;, At, Af, and 
Af. cy 





Ar 
c= aaa teres Cy 1 
T.=T; n 7Af (1) 
where 0 s T. = T;, Af. = Af, and the compression ratio is 
Ar Af. +t 
Tor = (1-556) ; (2) 


It is clear from the above description that time expansion is also 
possible by the use of an increasing delay characteristic in the LDF, 
and in such a case, the filter will become a time-expansion filter. 


2.2 Impulse response of a general LDF 


We consider the general case of an idealized LDF as shown in Fig. 2a. 
The bandwidth of the filter is A f centered at fo. The group delay varies 
linearly over the passband as shown in the diagram. The transfer 
function of the filter, using analytic-signal notation, is 


exp =/2n) (f= fo) T0 “Ee + a 
a 


Bl eg cae! 


0, elsewhere, (3) 
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Fig. 2—Characteristics and impulse response of an LpF. (a) Transfer function. 
(b) Impulse response. 


where 


A 
aa ct (4) 


T is the group delay at fo, and ¢: is a constant. The inverse Fourier 
transform of H(f) gives the impulse response A(t). The derivation for 
A(t) is given in Appendix A. Neglecting some small envelope and phase 
perturbations, a good approximation (sufficient for our present consid- 


eration) for h(t) is 
A 
cos 2n| (6 — al + ; t? + | 


A 
h(t) = w= stants, 


0, elsewhere, (5) 


where the multiplying constant has purposely been dropped, and ¢2 is 
a constant. A sketch of A(t) is shown in Fig. 2b. It is a linear chirp 
waveform starting at 7 — Ar/2 and ending at 7 + Ar/2, with the 
instantaneous frequency varying from fo — A f/2 to fo + Af/2 at a chirp 
rate of a. 


2.3 Analysis of time compression 


In this analysis, we again restrict our discussion to a single scan line 
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for simplicity. We denote this input scan line by A(¢) in the interval 
(0, T). Referring to Fig. 1, the linear chirp signal is given by 


c(t) = cos 2n| (i - als 





Jer beso 0O<t=<=T, (6) 


where £ is the chirp rate defined by 


_ Ak 
B=a> (7) 


and ¢3 is a constant. The parameters fo and Af. are the chirp center 
frequency and deviation, respectively. There are also two implicit 
assumptions: (Z) fo >> Af; and (it) fo >> the highest frequency in the TV 
signal. 

The LDF input is the product of the Tv input A(t) and c(é), ie., 


x(t) = A(t)cos 2n| (-SE\e+ bee + oo O=tsT. (8) 





To obtain the LDF output, we can either use the time-domain approach 
by convolving x(t) with the impulse response A(t) of the LDF, or use 
frequency domain analysis by multiplying the Fourier transform of 
x(t) by the LDF transfer function and then performing an inverse 
transform. The former method is much simpler and is presented here. 
The latter is tedious, offers no additional insight, and is, therefore, 
deleted for brevity. 

The LDF is assumed to have an extended passband over ( fo — A f/2, 
fo + Af/2), where Af > Af.. Its delay characteristic is shown in Fig. 3. 
Note that the delay slope is given by 


DELAY 





FREQUENCY (f) 


Fig. 3—Delay characteristic of the LDF. 
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ae Ar 
a= Af (9) 


Also note in Fig. 3 that the delay at fo + At/2 is zero, as some constant 
delay through the device has been dropped for simplicity. Using the 
result of Section 2.2, the impulse response of the LDF is 
a 2 
cos 27 ht—5¢ +dos4|, Osta Az, 


h(t) = 
0, elsewhere, (10) 


where f; 4 fo — Af/2, and ¢4 is a constant. The LDF output is then 


y(t) -| x(r)A(t — 7)dr 


-{ 4¢ 
7 2 


x{ cos2a| 0) + (at—Af.— fat fa)t + (B- a+ 





. 
cos 2a] Olt) (2fo— at + fe — fs)t + (B + a) =|} o- 


0st=T, + A, (11) 

where the limits of integration are defined by 
T, 4 max(0, t — Ar), (12) 
T>2 4 min(t, T)), (13) 


fs and f, are defined in Fig. 3, and 
B(t) A fat - 5 t? + du. (14) 


A constant phase term has been dropped in (eq. 11), and we will 
neglect all unimportant constant multipliers and phase shifts in sub- 
sequent discussions. The integral of the second cosine term in eq. (11) 
can be discarded because of the high frequency component 2 for in the 
integrand. Therefore, y(t) becomes 


T2 
y(t) -| A(r)cos 2n| O10 + (at — Af. — fr + fa) 7 


T, 


+ (B-—a)=— ler O0s¢=T7,+Ar. (15) 
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Let us examine the above integral expression carefully. Using analytic- 
signal notation, we rewrite it as 


T2 
y(t) = | A(t) exp] 24] 8 + (at —Afc — fa + fa) 


Ty 


+ (B — a) Jer} 


T2 


= exp j27 P(t) i 


Ty 


A(7) exp] 72a] (at — Af. — fa + fa) 


2 
ae =a) let. 
O0st<T,+ Ar. (16) 


In this form, we can define a complex envelop for y(t) as 


T2 
Ay,(t) A | A(r) exp| 2 (at —-Af-—fa+fa)t 


T 
2 
+ (B - a) 5 |. O<¢<7)+Ar. (17) 


In the above, note that A(r) is a slow-moving function while the 
exponential term contains a highly oscillatory chirp given by the 
derivative of the bracketed argument with respect to 7, i.e., 


filt) = (at -Afe—fa+fa) + (B-—@)7r, iS tST2. (18) 


This can also be obtained by simply taking the difference of the 
instantaneous frequencies of x(r) and A(t — 7) in eq. (11). The convo- 
lution integral involved is illustrated in Fig. 4, where we show x(t) and 
also the corresponding f;(r) at a fixed ¢t = ft). It can be seen that A,(t,) 
is given by an integral over (71, T2) of a linear chirp waveform at a 
chirp rate of (8 — a) and with an envelope modulation A(7). Further- 
more, the chirp frequency /; inside (71, T2) may vanish at some 7 as 
shown in Fig. 4. In such a case, the value of y(t¢,) is dominated by the 
integral eq. (15) over the small interval surrounding that 7, where f; 
goes to zero. This is, of course, the well-known quasi-stationary ap- 
proximation. The approximation is good if the chirp rate (8 — a) 
and the interval (T}, T2) are large, and A(r) variation is slow by com- 
parison. Using this approximation, at ¢ = ¢, inside the valid interval 
(0, T; + Ar), 


Ay(t,) = RA(71), (19) 
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where k is a constant, and 7 is obtained by solving eq. (18) with f; set 

to zero and t = t, Le., 

at — Af. = fs + fs 
B-a 

This quasi-stationary approximation is indeed equivalent to the phys- 

ical interpretation described in Section 2.1. As a check, let us derive 

the TcR from the approximation above. We know that A(7) is nonzero 


only if 0 = 7; = T;. The corresponding ¢; can be solved for using eq. 
(20), and the end points of tf; constitute the interval T,, i.e., 


Bit eid eR) |. ie i) 


(20) 


11> 


T. 
a a 
= ni(1 - ‘). (21) 
a 
Therefore, 
B -1 
TCR = (.-4) ; a= B, (22) 


which agrees with eq. (2) with the substitutions of a and f according 
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to the definition eqs. (4) and (7), respectively. See Section 2.4 for a 
continued discussion of A,(é). 

To finish our analysis of the time compression, we return to y(¢) in 
eq. (15) and expand the cosine term: 


y(t) = y-(t)cos 27@(t) — y.(t)sin 27 V(t), (23) 


where 
2 


T2 
yt) A [ A(rt)cos 2n| (at — Af. — fa + fa) + (B — a) len (24) 


Ty 


and 


T> 2 
ys(t) A | A(r)sin 2a (a — Af. — fa + fs)t + (B — a) sla (25) 


T, 


In eq. 25, y-(t) and ys(t) can be viewed as the in-phase and quadrature 
components of y(t) after synchronous demodulation. The envelope 
detector output is then 


2(t) = [y2(t) + yi(t)]'”. (26) 


2.4 Some fundamental properties 


We have just derived mathematical expressions for the time-com- 
pression process. These expressions are too complicated for easy 
interpretation. However, we have demonstrated that the technique 
works if a quasi-stationary approximation is made for the chirp 
waveform. In other words, the instantaneous frequency of the chirp 
wave could be used as if it were a stationary carrier frequency in a 
steady state analysis. . 

Making such an assumption, it is seen from eq. (2) that infinite 
compression, TCR = ©, results if Af = Af, and Ar = T; (a = £). But 
from eqs. (15), (24), and (25), it is obvious that the LDF output after 
synchronous detection (for a = £) will actually become the Fourier 
transform of the input envelope A(t). This can also be recognized as 
the well-known chirp transform or real-time transform*’ commonly 
used in chirp radar and SAw processors. Therefore, the quasi-stationary 
model is invalid for this case. 

A case where the quasi-stationary approximation clearly holds is 
where Af, > » and a”' => 0, i.e., the chirp range is very large, and the 
delay slope of the LDF is close to zero. The result is, of course, a very 
slight compression, i.e., the TCR is slightly larger than 1. Therefore, 
without doing any specific calculation, we see that the quasi-stationary 
assumption is valid, at best, for small TcRs, and it breaks down 
somewhere between TCR = 1 and o. Since our practical applications 
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require TCR => 2, the case of TCR = 2 becomes most interesting and is 
investigated in the simulation later. 

Let us now assume the quasi-stationary model and see what kind of 
distortion would result. The input to the LDF can be expressed as a 
multitone AM signal: 


N 
x(t) = (: +>» mycoses Joost (27) 
j=l 
where w,; are the angular modulating frequencies, and «, is the angular 
carrier frequency at some fixed instant of time. Let 7, be the delay of 
the LDF at w = w,. The phase characteristic of the LDF is 


$(w) = —(Te + &-/a)w + w’/(2a) +c, (28) 


where c is a constant, and a is defined in eq. (9). The LDF output is 


2 
x(t) = cos ot + (Te + We/a)(—We) + = + | 


N 2 
m; We et ; 
+ y — { os] aa w jt — (* + Noo + w)) jeer c 
y=l 2 Qa 2a 


ne NS 
+ cos at —wyt- (« + No. — w;) 4S Ons, |} (29) 
Q 2a 


In the above expression, various sidebands of the input AM signal are 
delayed asymmetrically about the carrier frequency. This would, of 
- course, distort the signal. After some tedious manipulations, the en- 
velope of y(t) is found to be 


ie a a ae 
Y(t) = {[ + ¥ mjcosw(t — reos( $2) | + 


j= 


N 2\ 72) 1/2 
¥ m;cosw (t — rosin( 22) , (30) 
j=1 2a 


Comparing Y(t) and X(t), it is obvious that distortionless transmission 
results only if w//(2a) « 1. More importantly, the distortion in Y(t) is 
dependent on the input signal and, therefore, cannot be equalized 
easily. On the other hand, if synchronous demodulation is used in 
place of envelope detection, we obtain the first square bracket term in 
eq. (30), i.e., 


= ¥. Ww? 
Y,(t) =1+ ¥ mjcosw; (t — reos( 2) : (31) 
j=l 2a 


where the distortion shows up as cos[w3/(2a)] which is independent of 
the input signal, and equalization is thereby feasible. To do synchro- 
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nous demodulation, we need to know the instantaneous frequency of 
the carrier in y(t), the LDF output, which we have not addressed so far. 
Let us do so in the following. 

Stating eq. (16) again, 


T2 2 
y(t) = exp[j27®(t) | A(r) exo) tb — a) 4 
T, 


-exp[ J27(at — Aft — fa + fs)t]d7. (32) 


In the above we have, in part, an output chirp frequency of d®/dt = 
(fs — at). The integral part, on the other hand, can be interpreted in 
two different ways: (i) It is a “quadratic” chirp transform of A(r). As 
such, little is known about this transform. (iz) It is the Fourier trans- 
form of A(t) exp[j27(B — a)r?/2], where A(r) can be viewed as an 
envelope modulation on a chirp signal with frequency (8 — a)r. 
Equivalently, it is the convolution of the Fourier transform of A(r) 
and that of the chirp signal. The result may very well contain high 
frequency components depending on the magnitude of (8 — a) and the 
shape of A(r). In fact, simulation results show that the instantaneous 
frequency of y(t) can be quite different from (f; — at). Therefore, 
frequency predictability is difficult in the general case, and synchro- 
nous detection as discussed above cannot be used easily. 
The following is a summary of the analytic results: 

(t) The mathematical expressions are complicated, and simulation 
is necessary to obtain numerical insights. 

(ti) The compression operation can be justified by a quasi-station- 
ary model. Under this model and with envelope detection, multitone 
AM leads to nonequalizable distortion. Furthermore, the quasi-station- 
ary model is valid only for small TcRs. 

(tit) Synchronous detection is very difficult because of frequency 
unpredictability. 


lil. SIMULATION 


This section describes simulation results. These numerical results 
show that the proposed technique indeed time compresses the input 
signal, but the output distortion is probably unacceptable for TV 
transmission using today’s SAW devices. 


3.1 Preliminary set-up 


A computer subroutine was written to simulate the time-compres- 
sion operation. It accepts an arbitrary input A(¢) defined in the interval 
(0, T:) and outputs y(t), y-(t), and y.(t), given by eqs. (15), (21), and 
(22), respectively. Note that y(t) is the RF output of the LDF, and y-(¢) 
and y,(t) are the in-phase and quadrature components after synchro- 
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nous demodulation with ®(t) so that the ideal envelope detector 
output z(¢) can be easily calculated from eq. (23). Also note that all 
three outputs have the same integral form, i.e., 


l= | f(t)cos[2r(at? + bt + c)]dt, (33) 


where we can assume ¢; = 0, #2 = ti, and a > O. In the computer 
program, we break the interval (0, 7;) into many small segments such 
that within each segment, a linear approximation of A(t) is valid. After 
doing so, the integration over each of these segments can be done in 
the form of eq. (33) with 


f(t) = mt+k. (34) 


The limits of integration ¢, and ¢2 in eq. (33) are, of course, the end 
points of the small time segment. Using the linear representation eq. 
(34) for f(t), J in eq. (33) can be integrated in closed form in terms of 
Fresnel integrals, and the result is 














IT=mh+kh, (35) 
where 
I, = [2/(2p)]'{[C(22) — C(z1)]cos B — [S(z2) — S(z:)]sin B}, (36) 
i= ae {sin x2 — sin xi — g[7/(2p)]'°[C(22) — C(z:)]} 
+ > {cos x3 — cosxi + g[m/(2p)]'[S(z2) — S(z)}}, 0) 
and 
p=2na; q=27b; r= 2z¢, (38) 
2 
q 
Baap, 39 
: (=) (39) 
|= Vt; + ae a 
- i oF (40) 
= Vpts + aa. = 
X2 Pp 2Vp 
*i 
i 42 
— (42) 
X2 
= 43 
2 = (43) 
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The Fresnel integrals in eq. (36) are defined by 


C(z) = | cos (5 «) dx (44) 
‘ 2 


S(z) = | sin (; i) dx. (45) 
: } 


There are also some simpler cases, e.g., a = 0 or b = 0, which are not 

shown for brevity. Since time expansion can be achieved by reversing 

the delay slope of the LpF, the subroutine for simulating the time 

compression can also be used to simulate the time-expansion filter. 

Examples of both time compression and expansion will be shown later. 
In all subsequent simulations, the input is always 


and 


1, OStsT (T, = 64 ps) 


As i otherwise (46) 


This rectangular pulse is, of course, not a representative of the video 
signal. However, except for edge “ringings,” the system should com- 
press the pulse properly, and peak-to-peak (p-p) ripples at the center 
portion of the output pulse should give an indication of the magnitude 
of distortion involved. To examine the outputs y(¢), y-(¢), and y.(t), we 
have also developed a set of programs to estimate the instantaneous 
frequency at various time instants of y(t), as well as the p-p ripples of 
the output pulse. 

Because of the complexity of the equations and various computer 
routines, it is not easy to assure that there is no bug in the programs. 
However, we did make some runs for the special case a = £ (Fourier 
transform case) where results are known. The waveforms y-(¢), ys(t), 
and z(¢), and the output chirp frequency and offset frequency of the 
LDF given by eq. (31) were all verified carefully. Such a check assures 
the validity of the computer programs. 


3.2 Examples of time compression 


Four examples of time compression are discussed in this section. 
They all have TcR = 2 (i.e., 2:1 compression) and an input given by eq. 
(46) (see Fig. 5a). The chirp frequency range Af, for the input x(¢) to 
the LDF is (fz, fs), while the passband Af of the LDF extends over 
(fi, f4). The delays at f; are denoted by 7; (i = 1 to 4), respectively. The 
delay dispersion of the LDF is defined by At = 7; — 74. The descriptions 
for these examples are as follows: 

(t) The key parameters for the LDF are shown in Fig. 5b. In this 
example, we let fi = fo, fs = fs and choose 
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(a) 
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a ee 60--~-— 
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Fig. 5—Parameters for time-compression technique. Example 1—(a) Input pulse (¢); 
(b) Delay of LDF; (c) Expected output (¢). 


Af. = Af = 60 MHz. (47) 


From eq. (2), TCR = 2 with Ar = 32 ps. The time-bandwidth product 
(BT) of the LDF is defined by 


BT = (Af)(A7). (48) 


Here, BT = 1920. Bandwidth product ~ 10,000 to 50,000 is considered 
as a practical range for SAw filters. The “expected” output is illustrated 
in Figure 5c where the output compressed pulse appears between t = 
32 ps and 64 ps. The small ripples in the time intervals, 0 to 32 us and 
64 to 96 us are to illustrate the nonzero output of the LDF in these 
regions. 

(tt) The LDF parameters are shown in Fig. 6a and the expected 
output in Fig. 6b. Here, Af. = 60 MHz which is the same as in Example 
1 (Fig. 5), but Ar and Af are both increased by a factor of 3. With 
Af = 180 MHz and Ar = 96 ps, BT = 17,280. 

(zit) The parameters are shown in Fig. 7a, and the expected output 
is the same as that of Example 2 (Fig. 6) because the delay dispersion 
has remained the same. We increased Af to 600 MHz yielding BT = 
57,600, which is probably a little beyond present-day state of the art. 
We used Af. = 200 MHz. 

(tv) The parameters are shown in Figs. 6a and 6b. This is essentially 
the same as Example 3 in Fig. 7a, except Af is increased to 1200 MHz, 
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Fig. 6—Parameters for time-compression technique. Example 2—(a) Delay of LDF; 
(b) Expected output (¢). 


and BT = 115,200. This is probably not realizable in the near future. 
We used Af. = 400 MHz. 

The above examples are arranged to illustrate the effect of increasing 
Af (or the strengthening of the quasi-stationary model). 

With the input being a pulse 64-s long, the expected output for all 
examples is a compressed pulse, 32 ps in duration (TCR = 2). This is 
indeed so from the simulation results which show a compressed pulse 
approximately 32 ys long in the expected time slot. The output outside 
the compressed pulse duration is at least an order of magnitude lower. 
However, inside this compressed pulse, there are ripples created by 
the compression process, and these ripples are the resulting distortion 
that we are trying to estimate. The ripples are largest at the edges of 
the compressed pulse and smallest toward the center. Also, the ripples 
at the center are indications of the “best” performance of the time- 
compression filter. In our computer routines to estimate distortions, 
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32 


| Ren ers Lette [ere ore en 
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Fig. 7—Parameters for the time-compression technique. Example 3—(a) LDF (600 
MHz). Example 4—(b) LpF (1200 MHz). 
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we take some symmetric time interval, say tATu, about the center of 
the time-compressed pulse, and we record the maximum and minimum 
pulse magnitudes, denoted by Zmax and Zmin, respectively. The p-p 
ripple distortion (over +AT.z) is defined by 


RD 4 20 log ee : (49) 


(2inae + Zinind yf & 
where RD is in dB. Both 2max and Zmin are positive because A(t) is a 
positive pulse. We plot the p-p ripple distortion versus ATg in Fig. 8. 
The largest AT is 16 ys because that is the edge of the compressed 
pulse. Although it is difficult to translate the meaning of RD to a TV 
quality measure, it can be certain that a large RD (i.e., large ripple) 
would mean poor transmission quality. To make a Tv system workable, 
an RD of less than —45 dB is probably necessary, although other 
applications may not require such a low distortion. 

Referring to Fig. 8, it is obvious that the distortion gets progressively 
smaller from examples 1 to 4. But the lowest distortions, despite the 
large BTs involved, are still too excessive for high quality Tv transmis- 
sion. Some more examples are provided in Appendix B for additional 
insight into the problem of ripple distortion. 


3.3 An example of time compression and expansion 


In this example, we use practical design parameters involving both 
time compression and time expansion (Fig. 9). Figure 9a shows a 64- 
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Fig. 8—Distortion results for time-compression examples. 
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Fig. 9—Parameters for (a) time-compression and (b) time-expansion examples. 


ps pulse as an input to the compressor, and the expected output. The 
actual ripple distortion (i.e., simulated result) over the 32-us com- 
pressed .pulse is shown in Fig. 10a. To check the operation of the 
expander alone, we assume an input of a 32-us pulse to the expander. 
The expected result is shown in Fig. 9b, and the simulated ripple 
distortion is plotted in Fig. 10b. It is clear from Figs. 10a and 10b that 
both the compressor and the expander lead to considerable distortion. 
To simulate a total system with compression and expansion, we 
used the distorted 32-y1s pulse from the compressor as an input to the 
expander. The expanded output pulse is so distorted that it becomes 
meaningless to make a ripple distortion plot as before. Instead, we plot 
a small segment near the center of the expanded pulse in Fig. 10c to 
illustrate its excessive distortion. The performance of this total system 
is definitely not suitable for Tv transmission. Some filterings were tried 
at the output of the expander, but little improvement was obtained. 


IV. CONCLUSION 


We have studied a time-compression technique based on a simple 
configuration of saw filters. The technique was derived heuristically 
and can be viewed as a quasi-stationary model for the chirp signals. 
Numerical results show that excessive distortion is created, and its 
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Fig. 10—Results for time-compression and time-expansion examples. (a) Ripple dis- 
tortion of time-compression example. (b) Ripple distortion of time-expansion example. 
(c) Pulse ripple for time-compression and time-expansion examples. 


application to Tv transmission is not suitable unless some kind of 
equalization is provided. One such equalization is the chirp transform 
processor® which involves considerably more complexity. Simpler 
equalizations may be possible but do not seem to be straightforward. 
As for other applications where the distortion requirement is less 
stringent, this approach may be feasible. 
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APPENDIX A 
Impulse Response of a Linear Dispersive Filter 


A linear dispersive filter (LDF) is a filter having a linear delay 
characteristic. Using the analytic-signal notation, its transfer function 
can be written as 


1 _ Aw eee Aw 
Go) =4 7 2 7° =o, (50) 
0, elsewhere 
and 
A A 
D(w) = (To — Swo) + Sw, -> =w-—woX a (51) 


where G(w) and D(w) are the gain and group delay, respectively; wo is 
the angular center frequency, and Aw is the bandwidth in radians; 7» is 
the delay at wo; and Ar is the delay dispersion over wo + Aw/2. The 
delay slope is defined by 


Ar 
pals 
sf Aa (52) 
Integrating D(w) with respect to w, we obtain the phase, 
o(w) = —(To — Swo)w — 5 w, > <w-—uwt (53) 


where the integration constant has been dropped for convenience. 
Using exponential notation and neglecting all unimportant multiplying 
constants and constant phase shifts in subsequent discussions, the 
impulse response of the LDF is 


Aw 
wot+— 


h(t) = Re | . exp {~/ tr — Swo)w + 5 || exp(jwt)dw. (54) 


o-—— 


The above is simply an inverse Fourier transform of the transfer 
function. By a change of variable, it can be rewritten as 
Aw 
2 s 
A(t) = Re exp(Jwot) if exp| ( + 5 .)| exp(jwt)dw. (55) 


There are two methods to solve for h(t) according to the above: (z) 
extend the limits of integration to + and obtain a closed form solution 
easily; (ii) retain the finite integration limits and derive the result 
using Fresnel integrals. Both methods are shown briefly below. 
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Method! 


Changing the limits of integration to +0 in eq. (55) and recognizing 
that the integration of the term exp(—j7.w) leads to a delay of 70 in 
A(t), we obtain an integral in the form of 


J= | exp(—jkw”)exp(jwt)dw, (56) 


where k& is a constant. Putting the integrand in Gaussian form by 
completing the square, we get 


7 tr? on 
= \/— 7{—-——|]| |. 5 
1= yee |1(5-4) | en 
Applying the above to eq. (55), we obtain the desired result 
t = 2 
h(t) = Re exp(jwot)exp | (58) 
or 


eine 
h(t) = cos oe | (59) 
2s 


However, because the original transfer function has a delay dispersion 
Az defined about 70, the valid range of ¢ for eq. (59) is 


A A 
mS Stsmt>. (60) 


Method Il 
Setting wi = wo — Aw/2 and 71 = t — Ar/2, eq. (55) is written as 


Aw 
h(t) = Re exp(jwit) | exp |-7 (ne + - .)) exp(jwt)dw. (61) 
0 


Again recognizing that exp(—j7:w) leads to a delay of 71, we may 
substitute 7 = ¢ — 7; and 


Aw 
h(t) = Re exp(jw:r) i exp (-73 .*] exp(jwt)dw. (62) 
0 
Completing the square in the integrand, we obtain 


ry 8 r\ 
h(r) = Re exp(jwiT)exp ( x] | exp | 5 (« - 4 dw. (63) 
0 


The above can be integrated using Fresnel integrals, and the result is 
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: 7 
h(r) = Re exp ; (on + =) | 


X{[C(y2) — C()] —JLS(y2) — S(T}, (64) 


where 


7 At 


Ss T 
yo A ite (: = i). (66) 


C(z) and S(z) are the Fresnel integrals defined in eqs. (44) and (45). 
Consider the bracketed term: 


[C(y2) — C(y1)] — F{S(y2) — S(1)] A e(t)exp[7A(7)]. (67) 


Neglecting small ripples, p(7) is constant over 0 = 7 S Ar and vanishes 
outside this range. The phase term 6(7) is approximately constant over 
the same interval 0 <= 7 <= Ar. Therefore, putting back the 7, delay, we 


have 
ot SK 
h(t) = Re exp {i fe —7) + A] | (68) 


je RAI Bo (-z). (65) 


APPENDIX B 
Additional Examples On Ripple Distortion 


Additional examples are provided here for more insight into the 
phenomenon of ripple distortion. The first case of interest is that of 
band-limiting effect on the input pulse. We use a 32-ys input pulse and 
low-pass filter it with a raised-cosine characteristic, where the gain is 
unity from zero to 8 MHz, 0.5 at 9 MHz, zero at 10 MHz, and the delay 
is constant over the passband. The output is, of course, a 32-ys pulse 
with ripples. The magnitude of these ripples are plotted in Fig. 11 in 
the manner similar to Figure 8. 

The second case of interest is that of ripple distortion caused by the 
linear delay slope of the LpF. Three specific examples are provided to 
illustrate this effect: 

Case A: The input to the time-compression filter is a 32-us pulse. It 
is modulated by a CW frequency of fp = 1600 MHz. The LDF 
has a delay characteristic as shown in Fig. 1, where fo = 1600 
MHz and Af = 2400 MHz. 

Case B: The conditions are identical to those of Case A, extent r, = 
1000 MHz and Af = 1200 MHz. 

Case C: The conditions are identical to those of Case A, except fo = 
700 MHz and Af = 600 MHz. 
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Fig. 11—Distortions because of bandlimiting and delay slope. 


All three examples above illustrate the case of no time compression, 
but simply a constant delay through the LDF as viewed by the quasi- 
stationary model. Note that the magnitude of the delay slope increases 
by a factor of two from each example to the next and, hence, an 
increase in ripple distortion as seen from the results plotted in Fig. 11. 
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Conformal Mapping and Complex Coordinates 
in Cassegrainian and Gregorian Reflector 
Antennas 


By C. DRAGONE 


(Manuscript received June 18, 1981) 


In a Gregorian or Cassegrainian reflector antenna, the complex 
coordinate u’ of an output ray is related to the corresponding input 
coordinate u by a bilinear transformation, u’ = (au + b)/(cu + d). We 
discuss the properties of this transformation, derive its coefficients a, 
b, c, d, and give explicitly the conditions that must be satisfied in 
order that symmetry be preserved. The conditions are expressed 
directly in terms of the parameters that specify the path of the 
principal ray, which is the ray corresponding to the feed axis. The 
results are directly related to well-known properties of stereographic 
projections, and they are shown to be useful in the design of multi- 
reflector antennas which minimize aberrations and cross-polariza- 
tion. 


1. INTRODUCTION 


Gregorian and Cassegrainian reflector arrangements are needed for 
ground station and satellite antennas, and terrestrial radio relay sys- 
tems.’ In these antennas, a paraboloid of large aperture is combined 
with a smaller subreflector (an hyperboloid or an ellipsoid). The feed 
is placed at the antenna focal point, and it illuminates the subreflector 
with a spherical wave, which is then transformed by the two reflectors 
into a plane wave. Each input ray from the feed is thus transformed 
into an output ray parallel to the paraboloid axis. 

This transformation can be represented by a stereographic projec- 
tion.” Therefore, it is a conformal mapping—it transforms circles 
into circles, and it is described by the bilinear transformation 


_ aut b 
cu+d’ 


, 


(1) 
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where u is the “complex coordinate” of an input ray and w’ the 
corresponding output coordinate. In this article, we discuss the prop- 
erties of the bilinear transformation, derive its coefficients a, b, c, d, 
and give explicitly the conditions that must be satisfied to obtain 
circular symmetry, in which case 


b=c=0, d=1. (2) 


The results are related to well-known properties of stereographic 
projections, and they generalize previous results in Refs. 12 to 18. 

We first consider, in Section I, an ellipsoid illuminated by a spherical 
wave front S. We assume that S originates from one of the two foci of 
the ellipsoid, and determine the properties of a reflected wave front S’, 
assuming geometric optics. We determine for each point P’ of S’ the 
corresponding point P of the incident wave front S and show that the 
correspondence P — P’ is everywhere a conformal mapping. According 
to a well-known theorem of complex variables,? such a conformal 
mapping can be represented by the bilinear transformation (1), pro- 
vided suitable complex variables wu’ and u are defined for the rays 
through P’ and P. A suitable choice is obtained with two separate 
reference frames for P’ and P using the familiar relations” * 


0 a> 10 
u = e/*tan 3 u’ = e’* tan 3? (3) 


where @’, ¢’ and 0, ¢ are spherical coordinates. 

Since the two reference frames defining u and w’ can be oriented 
arbitrarily, eq. (1) implies that an arbitrary rotation of the input frame 
must transform the input coordinate u according to a bilinear trans- 
formation.”° The coefficients a, b, c, d of such a rotation are derived in 
Section V, where it is shown, for a rotation characterized by Euler 
angles a, B, y, that 


a=1, b = —e’*tan _ (4) 


c = e/‘tan 4 d = ef (+7), (5) 


Since an ellipsoid has an axis of symmetry, it is always possible to 
orient the input and output frames so as to reduce eq. (1) to the normal 
form 


u’ = au. (6) 


However, if the feed is centered around the z-axis of the input frame, 
then the reflected wave is blocked by the feed. For this reason, to avoid 
blockage, the z-axis must be tilted with respect to the ellipsoid axis. 
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Then, using eqs. (4) to (6) and properly orienting the input and output 
frames, it is shown in Section VI that eq. (1) assumes the form 
: u 
t =A = Dita (7) 
where M and i are the magnification and angle of incidence for the 
principal ray corresponding to the feed axis (i.e., they ray u = 0). 

The product of the two transformations given by eq. (6) and eqs. (4) 
to (5) gives the group of all possible transformations that can be 
obtained with an ellipsoid. One can show that this is the complete 
group of bilinear transformations. Thus, for any given values of the 
coefficients, a, b, c, d, it is always possible to find an ellipsoid (combined 
with suitable reference frames) which will produce the transformation 
(1) with the specified values of a, b, c, d. In Section VII, we consider 
an antenna consisting of N reflectors, each represented by a bilinear 
transformation. Obviously, the product of the N transformations is 
again a bilinear transformation and, therefore, the antenna can be 
represented by an equivalent ellipsoid. 

Most antennas are focused at ». Then the equivalent ellipsoid 
becomes a paraboloid, and the antenna can be represented as shown 
in Fig. 1, showing a feed illuminating the equivalent paraboloid from 
its focus 0. The coefficients a, b, c, d in this case are obtained by letting 
M — Oin eq. (7), and it is shown in Section VI that we then obtain, for 
the complex coordinate x’ + jy’ of an output ray, 


u 


x’ + jy’ = 2f (8) 


1—utani’ 
where uw is the input coordinate and f = OI is the focal length of the 
equivalent paraboloid. 

In Section VI, we derive a simple expression for the coefficient tan 
i in terms of the angles of incidence specifying the orientations of the 
various reflectors with respect to the principal ray. The value of tan 1 
is needed in the design of a multibeam. antenna to determine the 
aberrations caused by a small feed displacement from the focus. It is 
also needed to determine the output polarization, as shown in Section 
VI. 

Of special importance is the condition 


tan i = 0. (9) 


Then, using a corrugated feed” (or a feed with similar characteris- 
tics”’””) the output wave fronts become everywhere polarized in one 
direction. Furthermore, astigmatism is eliminated’® for small feed 
displacements in a multibeam antenna. The above condition can 
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XN 
~— PRINCIPAL RAY 


PARABOLOID 
WITH FOCUS O 


Fig. 1—A Cassegrainian or Gregorian reflector arrangement is represented by an 
equivalent paraboloid combined with a feed at 0. The principal ray corresponding to the 
feed axis is reflected at I with angle of incidence 1. 


15-17 as shown in 


always be satisfied by properly orienting the feed axis 
Section VIII. 

The above results are related to previous results by Brickell and 
Westcott,*"* Tanaka, Mizusawa,” Mizuguchi et al.,"° Hanfling,” and 
the author.”” There is a simple connection, pointed out in Section IX, 
between some of the expressions derived here and certain results in 
Refs. 13 and 14; the particular case, N = 2, is treated in those 
references. When only two reflectors are involved, there is always a 
common plane of symmetry, and the feed orientation can be derived 
geometrically as in Ref. 17. Our results differ from those of Refs. 15 
and 16 in two respects: first, they apply also for N > 2; second, tan 1 is 
expressed directly in terms of the parameters (magnifications and 
angles of incidence) that specify the path of the principal ray. As 
pointed out in Ref. 19, an important application of our results is in the 
theory of aberrations. 

The main results of this article are derived in Sections V through 
VII. Most of the results of Sections IJ through IV are well-known 
properties of ellipsoids, but their derivation is needed for Sections V 
through VII. In the following section, we discuss the transformation 
obtained when an ellipsoid is illuminated by a spherical wave front 
originating from one of the two foci. This transformation has the 
following basic property: it is a conformal mapping which gives cor- 
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rectly, not only the amplitude distribution of a reflected wave front, 
but also its polarization. All results of this article directly follow from 
this property. 


ll. CONFORMAL MAPPING, COMPLEX COORDINATES, AND THE 
BILINEAR TRANSFORMATION 


Let a linearly polarized point source be placed at O, one of the two 
foci of an ellipsoid, and let O’ be the other focus (in Fig. 2). Then for 
each ray from O, a corresponding ray through O’ is obtained after 
reflection by the ellipsoid. To determine the properties of this corre- 
spondence, introduce at the two foci separate coordinate systems x, y, 


REFLECTED 
RAY 






INCIDENT z 


x RAY 





Fig. 2—A ray from one of the two foci of an ellipsoid determines, after reflection, a 
ray through the other focus. The correspondence P — P’ between points of two wave 
fronts S and S’ is everywhere as conformal mapping described by eq. (1). 
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z and x’, y’, 2’ oriented arbitrarily, as shown in Fig. 2. Consider an 
incident ray from O with spherical coordinates 0, ¢, and let 6’, }’ be 
the spherical coordinates of the corresponding ray through O’. It is 
convenient to introduce as in Refs. 12, 13 complex coordinates u and 
u’ defined by eq. (3). Then, we show in this section that uw’ and u are 
related by a bilinear transformation. If both coordinate systems are 
right-handed, we shall see that the bilinear transformation does not 
relate u’ directly to u, but to its complex conjugate u*. To avoid this 
inconvenience, one of the two reference systems will be assumed to be 
left-handed as shown in Fig. 2. 

To better visualize the one-to-one correspondence between rays 
through the two foci, consider in Fig. 2 two wave fronts S and S’ 
centered at O and O’, respectively. Then, for each ray from O, one 
obtains on S and S’ two corresponding points P and P’. Furthermore, 
letting P be a variable point of a curve L of S, one obtains on S’ a 
variable point P’ describing a corresponding curve L’ of S’. Since the 
point source is linearly polarized, the curve L can be drawn so that it 
is everywhere tangent to the magnetic field. Then one obtains a 
polarization line of S, and it is shown in Appendix A that the corre- 
spondence P > P’ transforms polarization lines into polarization lines. 
That is, 


if L is a polarization line, 
then L’ is alsoa 
polarization line. (10) 


Another property is that angles are preserved and, therefore, the 
correspondence P — P’ is a conformal mapping. 

The above considerations apply also to an hyperboloid (in which 
case one of the two foci is behind the reflector), to a paraboloid, or to 
any combination of such reflectors. Thus, let the ellipsoid of Fig. 2 be 
combined with two paraboloids with foci at O and O’. Let the first 
paraboloid be centered around the z-axis, so as to map conformally 
the plane z = 0 onto the wave front S. Similarly, let the second 
paraboloid map S’ onto the plane z’ = 0. Then, the product of the 
above three reflections determines a one-to-one correspondence be- 
tween points U and U’ of the two planes z = 0 and 2’ = 0 in Fig. 2. 
This correspondence is everywhere conformal and, therefore, it implies 
a bilinear relation’ between the complex coordinates x + jy and x’ + 
Jy’ of two corresponding points U and U’. It is shown in the following 
section that the two paraboloids produce the transformations 


x+jy=2fou, x'+ Jy’ = 2fou’, (11) 


fo and fo being the focal lengths of the two paraboloids. Thus, the 
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desired result, eq. (1), follows at once. As pointed out earlier, eq. (1) 
requires that one of the two reference systems in Fig. 2 be left-handed. 


Ill. CONSTRUCTION OF A PARABOLOID BY A STEREOGRAPHIC 
PROJECTION 


The mapping between two wave fronts S and S’ in Fig. 2 can be 
represented as a product of two stereographic projections, as shown in 
Appendix B. In this section, we let O’ go to © on the z-axis. Then the 
ellipsoid degenerates into a paraboloid, S’ becomes a plane, and only 
one stereographic projection is needed.” 

Let the radius of S be chosen equal to the paraboloid focal length /f, 
and let S’ be the tangent plane z = fo. Let a correspondence P > P’ 
between points of S and S’ be obtained as shown in Fig. 3, with a 
stereographic projection from the axial point z = —fo. To show that 
this is the same correspondence determined by the rays reflected by 
the paraboloid, consider the reflected ray corresponding to P’, and let 
I be its intersection with the incident ray OP. Since the triangle PP’I 
is similar to the isosceles triangle NPO, PI = P’I and, therefore, 


OI = IP’ = fa, (12) 


which is the equation of a paraboloid. 
Notice, if p is the radial distance of the reflected ray from the z-axis, 
then from the triangle P’VN one has 


p = 2fotan : (13) 


PARABOLOID ~~ _ va 














je ——-—»— ——— > 





oO z=—-fy 
Fig. 3—Construction of a paraboloid by a stereographic projection. 
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Therefore, the reflected ray determines on the plane z = 0 a point U 
with the complex coordinate given by 


0 
x+y = 2foe’*tan 5 (14) 
which gives eq. (11). 


IV. TRANSFORMATION BY AN ELLIPSOID CENTERED AROUND THE 
Z-AXIS 

The usual construction of an ellipse (Fig. 4) involves two fixed points 
A; and A, and a variable point A3 which is varied, keeping the 
perimeter p of the triangle A;A2A3 constant. Then, A; describes an 
ellipse with foci A; and Ag, as shown in Fig. 4. A simple relation among 
the angles of the triangle A:A2As3 is given by the following theorem, 
which is derived in Appendix B with the help of two stereographic 
projections. 


Theorem: Given a triangle A,A2A3 with angles a, a2, a3, and 
perimeter p, its three sides dh, d2, d3 are given by 


2d; = (1 _ tan tan =), (15) 


where (1, m, n) is any permutation of (1, 2, 3) and d; = AmAn. 





oo 


Fig. 4—The angles ai, ae, a3 of a triangle A:A2A3 are related by eq. (15), which implies 
a linear relation between tan 6/2 and tan 6’/2. 
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By letting A; and A: coincide with O and O’ in Fig. 4, and letting A; 
be the point of incidence of a ray from O, one obtains from eq. (15) for 
[ = 3 the well-known relation 


1 + aotan > tan S = 0, (16) 


where ap is a constant determined by the distance d = d; between the 

two foci and by the path length t = OJO’, 
t+d 

= ———., 7 17 

ao a (17) 

If now the ellipsoid is centered around the z-axis as shown in Fig. 4, 

one has # = —a;. Furthermore, orienting the x’, y’, z’-axes as in Fig. 4, 

with the 2’-axis opposite the z-axis, we have ¢’ = ¢ and @’ = qw — az; 

therefore, eq. (16) gives 


u’ = aou. (18) 


Thus, for this particular orientation of the reference axes, eq. (1) 
assumes the normal form of eq. (6). If now arbitrary rotations are 
applied to the reference axes of Fig. 4, as we have seen in Section II, 
eq. (18) assumes the form of eq. (1). This implies that the above 
rotations transform u and w’ according to bilinear transformations, 
whose coefficients are derived next. 


V. ROTATIONS AND REFLECTIONS’? 


Consider Fig. 5a showing the x, y, z-axes oriented arbitrarily with 
respect to the x’, y’, z’-axes. We wish to determine, for a ray from O, 
the relationship between its coordinates 9’, ’ and @, with respect to 
the two coordinate systems. We have seen that the relationship can be 
written in the form (1), whose coefficients we now express in terms of 
the Euler angles a, 8, y specifying the orientation of the x’, y’, z’-axes 
with respect to the x, y, z-axes. Notice, for the purpose of determining 
the coefficients of eq. (1), consideration can be restricted to real values 
of u. 

The x’, y’, z’-axes in Fig. 5a can be obtained from the x, y, z-axes by 
three successive rotations: a rotation around the z-axis through the 
Kuler angle a, followed by a rotation around the y-axis involving the 
second Euler angle £ and, finally, a rotation around the z-axis by the 
third Euler angle y. The first and last rotations are described by the 
transformations 


u’=ue"™, uu’ =ue”™. (19) 


To determine the second transformation, consider Fig. 5b which illus- 
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(b) 





Fig. 5—The x’, y’, 2’-axes are obtained from the x, y, z-axes through (a) an arbitrary 
rotation, (b) a rotation around the y-axis, and (c) a reflection by a plane. 


trates a rotation around the y-axis by the angle 8. Then, for a ray in 
the xz-plane, 0’ = @ — B and, therefore, 


6 
tan — — tan e 
é 2 2 
tan — = ———_—_—_—_, (20) 
2 1 + tan tan B 
2 2 
which gives 
ut bo 
tA a 2 
a 1 — bow’ (21) 
with 
bo = —tan e. (22) 


The product of the above three rotations can now be calculated 
straightforwardly. Letting 


6 = —tan B e”*, (23) 
from eqs. (19), (21), and (22) we obtain 
, —jJlat 6 +u 
u’ =e! a omy Fy (24) 


which represents an arbitrary rotation. In the special case where the 
axis of rotation is orthogonal to the z-axis, we can verify that 


at+y=0 (25) 
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and eq. (24) reduces to 
, oO+u 
aes | a 


Eqs. (24) and (26) can be considered generalizations of the trigono- 
metric identity (20) to complex coordinates. They are directly related 
to the Cayley-Klein representation” of rotations by complex matrices. 


5.1 Reflections 


We now combine a rotation orthogonal to the z-axis with an inver- 
sion of the z-axis and obtain from eq. (26), replacing u with 1/u*, 


_ 1+ bu* 


u* — b*’ 


, 


Uu (27) 
representing a reflection by a plane shown in Fig. 5(c). The x’, y’, 2’- 
axes in Fig. 5(c) are the reflected images of the x, y, z-axes, and we 
shall see in a moment that the coefficient b in eq. (27) is given by 


_ Nx +jNy 
Nz 


where N,, N,, Nz are the x, y, z-components of a vector N orthogonal 
to the reflector. 

Eas. (23) through (28) give the transformation of u when a rotation 
(or a reflection) is applied to the reference axes. Suppose now the same 
rotation (or reflection) is applied to a ray with initial coordinate u, so 
as to obtain a new ray with coordinate uw’, as in Figs. 6(a) and (b). 
Then, if both uw’ and u are measured with respect to the x, y, 2-axes, 
we find* that a, y, 8 in eqs. (23) and (24) must be replaced with —a, 
—y, —8, whereas the coefficient 5 in eq. (27) is still given by eq. (28). 

To derive eq. (28), let a reflection be applied to the ray u = ™, as in 
Fig. 6(c). Then the angle formed by the z-axis and the reflected ray is 
bisected by N. Thus, since N is in the plane of incidence, and the angle 
of N with respect to the z-axis is 0/2, 


_ Na+ JjNy 
= 


b (28) 


, 


for u=, 


This gives the desired results of eq. (28). 
If, instead of a plane, the ray is reflected by an arbitrary surface 
z= f(x, y), then from eq. (28) one obtains 


* To show this, first let u’ be measured with respect to the x’, y’, z’-axes. Then one 
obtains the identity u’ = u. Next apply to the x’, y’, z’-axes the inverse transformation 
of eq. (24) or (27). The inverse of eq. (24) is a rotation with Euler angles —a, —y, —£, 
whereas the inverse of eq. (27) is a reflection with the same coefficient 6 of eq. (28). 
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ROTATION a, By ~~ 
N 





(a) (b) (c) 


Fig. 6—A ray with initial coordinate u is subjected in (a) to a rotation and in (b) to a 
reflection by a plane. Notice in (c) u’ = 6 for u = ». 


0 . 0 
b= (2+12) f(x, ¥), (29) 
and eq. (27) gives a simple relation between the ray coordinates and 
the partial derivatives of f(x, y). A similar result’ is obtained if the 
equation of the reflector is specified in spherical coordinates, as shown 
in Appendix C. 

Notice that eq. (27) applies not only to a ray, but also to its 
polarization in which case u and wu’ represent the directions of the 
incident and reflected polarization with respect to the x, y, z-axes 


VI. TRANSFORMATION BY AN ELLIPSOID WHEN THE OUTPUT FRAME 
IS THE MIRROR IMAGE OF THE INPUT FRAME 


In this section, we orient the z-axis in the direction of the principal 
ray. In a reflector antenna, this is the ray that corresponds to the feed 
axis. Thus, the principal ray determines the point of maximum illu- 
mination over the antenna aperture. To maximize aperture efficiency, 
the feed is usually oriented so that the principal ray u = 0 passes - 
through the center of the aperture. The ray u = ~, which leaves the 
feed in the direction opposite to the principal ray, will be called the 
cardinal ray. 

Consider Fig. 7 which shows an ellipsoid with the principal ray 
incident at J with angle of incidence 1. Let the x’, y’, z’-axes be the 
reflected images of the x, y, z-axes with respect to the tangent plane at 
I. Then, the principal ray after reflection has the direction of the z’- 
axis, whose complex coordinate wu = A with respect to the x, y, z-axes 
is given by 

el¥ 


tan i’ 
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Y 


Fig. 7—Reference frames implied by eq. (34). Notice the plane of incidence for the 
principal ray rotated by ~ with respect to the xz-plane. 


y being the angle of rotation of the plane of incidence with respect to 
the x-axis. 

Initially, assume ) = 0. Then the x-axis is in the plane of incidence, 
as shown in Fig. 8, and the same is true for the x’-axis. Thus, both 
reference systems can be obtained from those of Fig. 4 by suitable 
rotations around the y-axes, which have the same orientation in both 
cases. Taking into account that the coefficients of these rotations are 
real, they transform eq. (18) into 

u’ =a——, (30) 
1+ cu 
where a and c are real coefficients which can be determined as follows. 
To determine a, consider a ray in the vicinity of the principal ray. 
Then @ and @’ are small and 
Of = -0', 
and ¢@ being the distances of J from the two foci. It follows that a is 
equal to the magnification M given by 
o 
M=-—. 31 
7 (31) 
To determine c, let u = ©. We then obtain in Fig. 8 the cardinal ray 
incident at z’ with angle of incidence i’. From the triangle JI’O’ of 
Fig. 8, 
; 1 
bo Se for u=o. 
tan(zi + 1’) 
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PRINCIPAL 
~~ RAY 
7 
I 





~ 
~W CARDINAL 
RAY 


Ny 
~_ PRINCIP 
NOTE: €= 07, =07 sc! a AL 


Fig. 8—To obtain eq. (33), each reference system must be oriented so that the z-axis 
is in the direction of the principal ray, and the x-axis is in the plane of incidence. 


Furthermore, applying eq. (15) to the triangle JJ’O’ with perimeter 
p= 2t=2(¢ + @’), 


(0+ ¢@’)tani= 7 tan(i +17’), (32) 
which gives the desired result, 
c = (M — l1)tant. 
Finally, 


u 


‘= M—_—_ 
Bi 1+ (M— lutani 


(33) 
which assumes the x-axis is in the plane of incidence, so that y = 0. 

Now consider the general case y # 0. Then both reference systems 
in Fig. 8 must be rotated around the z-axes by —y, and from eq. (33) 
we obtain 


u 


‘= M ————————————“™ 
" 1+ u(M — le tan i’ 


(34) 
which applies in general when the plane of incidence is rotated by an 
arbitrary angle w with respect to the xz-plane. 

Using this simple relation, we can now determine straightforwardly 
how the nonzero angle of incidence i in Fig. 7 affects the amplitude 
pattern of the reflected wave, its polarization, its symmetry, and the 
aberrations arising when the point source is slightly displaced from O. 
For i = 0, eq. (34) reduces to eq. (18). In this case the transformation 
has circular symmetry, since it is unaffected if identical rotations are 

applied to the reference systems around the z-axes. For z ~ 0, on the 
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other hand, eq. (34) lacks this symmetry. We now show that, by 
properly combining several asymmetric transformations of the type 
(34), it is always possible to obtain the symmetric transformations 
(18). This was first shown in Refs. (15) and (16) for two reflectors with 
O’ at and, in Ref. (17) under more general conditions. 


Vil. TRANSFORMATION BY A SEQUENCE OF ELLIPSOIDS 


Replace the ellipsoid of Fig. 7 with a sequence of ellipsoids, with foci 
Oo, Or, «+ , On as shown in Fig. 9. Let the (s + 1)th reference frame 
be the mirror image of the sth frame as in Section VI. Let M,, is, Ws; be 
the values of M, i, / for the sth ellipsoid. Then, the product of the N 
transformations of Fig. 9 gives eq. (34) with 


M=M,.--- Mn (35) 
and 
(M — l)e“*tan i = (M, — le tan i 
+ (Mz —1)Mietan i2 +--+ (36) 


We have thus shown that eq. (34), derived in Section VI for the 
ellipsoid of Fig. 7, applies also to a sequence of N ellipsoids. It also 





Sth REFLECTOR 


Fig. 9—An input ray with coordinated u is transformed by a sequence of N reflectors 
into an output Tay with coordinate u’ given by eq. (34). The principal ray for u = 0 is 
reflected by the s‘ reflector at I, with angle of incidence i,. 


RAY TRANSFORMATION IN REFLECTOR ANTENNAS) 2411 


applies to a hyperboloid, in which case M < 0 since then one of the 
two foci O and O’ in Fig. 7 is behind the reflector and therefore either 
for ¢’ is negative. For 


M— o, 


the ellipsoid of Fig. 7 degenerates into a paraboloid, shown in Fig. 1 for 
wW = 0. Then, from eq. (34), letting 


Ww 
M c — 
— 0, ui > Ms, (37) 


we obtain 


u 


w = 2f (38) 


1l—utanie”’ 
where f = 7, and 
w=x' + y’ (39) 
is the complex coordinate intercepted in Fig. 1 by the reflected ray on 
the plane 2’ = 0. 
Equation (38) also applies to the arrangement of Fig. 9, with the last 


ellipsoid replaced by a paraboloid, in which case w and tan i are given 
by eq. (36) with 


M = Mn = 0. (40) 
Then f in eq. (38) is the equivalent focal length given by 
f = M.¢n, (41) 


where M, is the magnification determined by the first N — 1 reflectors, 
M.=M, --- My-1, (42) 


and ¢y is the focal length of the last paraboloid. 
As pointed out in the introduction, it is desirable in general that 


tani = 0, (43) 


because then the transformation has circular symmetry with respect 
to the principal ray. From eq. (36) for N = 2, this requires 


Wy = Yo (44) 
and 
tan 1,(M, — 1) + tan t2(M2 — 1)M, = 0. (45) 


The first condition demands that the two planes of incidence (for the 
principal ray) coincide, in which case the two reflectors and the feed 
have a common plane of symmetry. In general, for arbitrary N, one 
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finds that it is always possible to satisfy condition (43) by properly 
choosing one of the planes of incidence and one of the angles i, for any 
arbitrary choice of the remaining 7;. The correct choice for i; is obtained 
straightforwardly using eq. (36). In some cases, it may not be possible 
to satisfy exactly the requirement (43). For instance, the N reflectors 
may have to fit inside a satellite and, because of the limited available 
space, it may be convenient to choose 1 * 0. Then, the resulting 
aberrations and distortion of the polarization and amplitude illumi- 
nation over the aperture are determined straightforwardly using eq. 
(34), as pointed out in Ref. 19. 


Vill. GEOMETRICAL DERIVATION OF TAN / WHEN THE LAST 
REFLECTOR IS A PARABOLOID 


Assume the final reflector is a paraboloid, and let the input point 
source be a corrugated feed,” or a feed with similar radiation charac- 
teristics.”’” Then, the spherical wave radiated by the feed has an axis 
of circular symmetry, and its polarization lines on a wave front S are 
given by a set of tangent circles as shown in Fig. 10. The contact point 
D for these circles is one of the two intersections of the feed axis with 
the wave front S. The other intersection C is the point of maximum 
illumination. Thus, C and D are determined by the principal ray (u = 
0) and the cardinal ray (u = ©), respectively. It is now recalled that 
the bilinear transformation (1) transforms circles into circles. This 
means that the polarization lines of an output wave front S’ are also 
a set of tangent circles. Their contact point D’ is determined by the 


MULTIREFLECTOR 
ARRANGEMENT 
\ 


PRINCIPAL 
RAY ~ 





Fig. 10—The polarization lines produced by a corrugated feed are tangent circles with 
contact point determined by the cardinal ray. 
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ray u = ©, and the point of maximum illumination C’ is determined by 
the ray u = 0. From eq. (8) for u > ~, the distance of C’ from D’ is 
given by 





2 
Cp = f : (46) 
tant 
and, therefore, for tan i > 0 
D’ > ~, 


Then the circles degenerate into parallel lines, and the output wave 
front S’ becomes everywhere polarized in one direction.” 

The above considerations suggest a simple procedure for determin- 
ing the feed axis orientation that corresponds to tan 1 = 0. The feed 
must be oriented so that D’ — o. This means that the cardinal ray 
must be reflected at © by the last reflector (a paraboloid). Thus, the 
cardinal ray after the first N — 1 reflections must pass through the 
paraboloid focus Oy_; with direction opposite to Oy-1V, where V is 
the paraboloid vertex. Therefore, one must orient the feed axis so that 
the cardinal ray (u = ©) produces after N — 1 reflections the ray 
VOn-1. Since the final direction of this ray is specified, the initial 
direction can be determined by retracing the ray in the reverse direc- 
tion starting from Oy-1. Then, after (N — 1) reflections of the ray 
Oy-1V, one obtains through 0 the direction of OC characterized by tan 
i= 0. This geometrical derivation is illustrated in Ref. 17. 


IX. AMPLITUDE AND POLARIZATION OF THE OUTPUT WAVE 
Consider two corresponding points P and P’ on the two wave fronts 
S and S’ of Fig. 2. Let the electric field at P be given by 


eJkr 


E = Ae : (47) 
r 





where r is the distance from the focus O and e is a unit vector 
specifying the polarization of E. Similarly, for the field at P’ 
e TRtr'+t) ~ 


E = —A’e’ ———_, (48) 


- 
where t= ¢ + @’. Let 
€ = COS ¢eli + SiN del, (49) 


where ij, is are unit vectors in the 0, ¢-directions and ¢, is the angle of 
rotation of e with respect to i. In this section, we show that the 
corresponding angle of rotation ¢ for e’ with respect to ij is simply 
given by 
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be — be = $' — g, (50) 


where it is recalled that ¢’ and ¢ are the arguments of the coordinates 
u’ and u, respectively. For the amplitude A’ we show that 


Arata, (51) 
m 


where the magnification m is given by 
1 u’u’*(1 + uu*) 


~ Muu*(1 +u’u’*)” ee 


These relations apply in general to an arbitrary sequence of ellipsoids, 
hyperboloids, and paraboloids arranged as in Fig. 9. If the first focus 
O is at infinity, then S is a plane and the spherical coordinates 6, @ 
must be replaced with polar coordinates p, ¢. Then, i; is a unit vector 
in the p-direction. Similar considerations apply if O’ is at ~. 

To derive eqs. (50) and (52), it is convenient to combine the ellipsoid 
of Fig. 2 with two paraboloids, as in Section II. Then, one paraboloid 
maps conformally the sphere S onto the plane z = 0 and the other 
paraboloid the sphere S’ onto the plane z’ = 0. Let Ao, eo and Ao, eo be 
the values of A, e produced on the two planes, and assume the two 
mappings are characterized by the transformations 


u=x+tjy, wW=x' + sy’, 
which imply fo = fo = 1/2 in eq. (11). These transformations do not 
affect the polarization angles ¢ and ¢, while the amplitude A is 
transformed according to the well-known relation 
A A 


6 1+uu*’ 


pe eee 
1 + tan? — 
an 5) 


(53) 


and similarly for Ao. According to geometric optics, conservation of 
power requires 


| Ao|?doo = | Ao|?doo, (54) 
where doo and doo are the areas of two corresponding elements of the 
two planes. Since the mapping between the two planes is conformal, 


doo du’ ? 
= 55 
doo du |’ (99) 














and using eqs. (34), (53), and (54), one obtains the desired result, 
eq. (52). 

Next, we derive eq. (50). Consider two corresponding points @ and 
Q’ of the two planes. Let the polarization line through Q be a straight 
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line through the origin, as in Fig. 11. Then ¢ = 0, and the corresponding 
polarization line through Q’ is a circle. The circle must pass through 
the origin O’, and also through the point D’ of coordinate 


M 1, 
2 =a ut 56 
=" Melia? (56) 





which corresponds to the point at o of the u-plane, as one can verify 
from eq. (34) letting u — o. Now the angle made in Fig. 11 by the 
chord D’Q’ with the tangent e¢ is equal to the angle 8 subtended by 
the chord at O’. As a consequence, one can verify that the angle ¢: in 
Fig. 11 is equal to the angle y between D’Q’ and D’O’. Thus, 


co 





and one can verify that this agrees with eq. (50). Using eqs. (50) 
through (52), one can now derive straightforwardly the amplitude and 
polarization of the output wave in Fig. 9. 


X. CONCLUSIONS 


With simple geometric considerations, we have derived the coeffi- 
cients of the transformation (1). Once the parameters 1;, M;, , that 
specify the path of the principal ray are known, the coefficients can be 
derived straightforwardly using eqs. (34) and (36). For a corrugated 
feed, it has been shown in Section VIII that the circles describing the 
polarization of an output wave front can be determined straightfor- 


&9 


U-PLANE 






~_ POLARIZATION 
LINE THROUGH Q 


POLARIZATION _ 7 
LINE THROUGH Q’ 


Fig. 11—Derivation of the polarization in the u’-plane when the w-plane is polarized 
in the p-direction. 
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wardly by tracing the cardinal ray. In Section V, it has been shown 
how the transformation (35) is affected by a rotation of the feed axis. 
The results will be useful to the design of reflector antennas as pointed 
out in Ref. 19. They also provide a simple interpretation for previous 
results of Refs. 13 and 19, as pointed out in Appendix C. 


APPENDIX A 


Consider Fig. 2. We wish to show that if L is everywhere tangent to 
the magnetic field, then this is true also for L’. Suppose initially that 
S and S’ are both centered at O. Then, the mapping determined 
between S and S’ by aray from O is just a similarity, with magnification 
determined by the radii of S and S’. This means that each line element 
6L’ of L’ is parallel* to the corresponding line element 6L of L. Thus, 
if both S and S’ are centered at O, or both at O’, then the two curves 
L and L’ certainly satisfy (10). 

Next, let S and S’ be centered at O and O’, respectively, and let I be 
the point of incidence for the ray corresponding to 6L. Then, the 
orientation of the corresponding line element 6L’ is not affected if the 
ellipsoid in Fig. 2 is replaced by the tangent plane at J. Thus, we 
conclude that property (10) is true in general, even if S’ and S are 
centered at different foci. 

Notice (10) implies that the mapping between any two wave fronts 
S and S’ preserves angles and, therefore, it is conformal. 


APPENDIX B 


We show that the mapping in Fig. 2 between the two wave fronts S 
and S’ can be represented as a product of two stereographic projec- 
tions. A variety of different representations can be obtained, depending 
on the radii r and r’ of S and S’. For simplicity, here we choose the 
two radii so that the two spheres S and S’ touch each other, as shown 
in Fig. 12. The contact point V is on the axis OO’ of the ellipsoid. Let 
N and N’ be the other intersections of the two spheres with the axis. 
Let P; be an arbitrary point of the tangent plane at V, and let two 
corresponding rays OP and O’P’ be obtained as shown in Fig. 12, with 
two stereographic projections from N and N’, respectively. We now 
show that the intersection J of the two rays satisfies the condition 


OI +10O’=rt+r’, (58) 


which is the equation of an ellipsoid. Let 6 and @’ be the angles VOP 
and VO’P’, respectively. Then the isosceles triangle ONP has two of 


* This property, is true only if S and S’ are spherical wave fronts or if L is a geodesic 
line of S. 
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Fig. 12—A point J of an ellipse with foci O and O’ is obtained with two stereographic 
projections from N and N’. 


its angles equal to 0/2, and similarly the triangle O’N’P’ has two 
angles equal to 9’/2. Furthermore, since VP, is tangent to both spheres, 


NP, X PiP = N’P, X P,P’, 


since both products must equal VP. Thus, the triangles PP,P’ and 
NP,N’ are similar and, therefore, Pi; PP’ = 0’/2, PP’P, = 180° — 0/2. 
The angles PP’I and P’PI can now be determined, and one finds they 
are both equal to (9 + 6’)/2. Thus, PI = P’I, which gives eq. (58). 

From the right triangles Pi VN and P, VN’, since they have one side 
in common, 


4 


0 
r tan 3 =r’ tan 3° (59) 


which gives eq. (16), and this implies the theorem of Section IV. 


APPENDIX C 


We now point out a simple connection between eqs. (28) and (29) 
and previous results by Brickell and Westcott.’*** Both u and u’ will 
be measured with respect to the same reference frame, the x, y, z-axes. 

Let an arbitrary reflector be illuminated by a spherical wave from 
the origin O of the x, y, z-axes, and let the reflecting surface be given 
in spherical coordinates by 


p = plu, u*), 
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where p is the distance from the origin O, and p(u, u*) is an analytic 
function{ of u, u*. Consider a particular incident ray with coordinate 
u =A, and let P be its point of incidence. To determine its reflected 
coordinate u’ with respect to the x, y, z-coordinates, it is convenient to 
introduce temporarily a second reference frame with the z-axis through 
the point of incidence. Then, using the subscript (_ )o for the second 
frame, and applying eqs. (27) and (29) to the ray through P, 


-—1 
fe) 
w= o( 2) , for uw=0. (60) 


Next, we apply a suitable rotation to the xo, yo, Zo-axes, so as to 
transform Wo, Uo into uw’, u, 


uo+taA lo +X 


u’ = ——_—., u =———__.. 
1 — uod* 1 — uA* 
From the second expression for uo = 0, 
au au* 


— = (1+ uu"), = 0, 
OUo OUo 





and, therefore, 
re) du a ou* a 


0 
— +— — = (1 + wu*) —, 
Un = OUo OU = BU OU* du 
for UW) = 0. From these relations, taking into account that u = AX for 
Uo = 0, one obtains the final result © 


i) 
p+ (it uu*)u 
ou 


u! = ——_______—__, (61) 


0 
(1 + uu*) —u*p 


valid for any u. This gives eq. (16) of Ref. 13. We have thus shown that 
this basic result can be considered a direct consequence of eqs. (27) 
and (29). 
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C. Total paid circulation ............... 6,503 6,592 


D. Free distribution by mail, carrier or 
other means, samples, complementary, 


and other free copies ................ 4,651 5,047 
E. Total distribution 
(sum of C and D) ................. 11,154 11,639 


F. Copies not distributed 
1. Office use, left over, unaccounted, 


spoiled after printing .............. 783 790 

2. Returns from news agents ......... None None 
G. Total (sum of FE and F—should equal net 

press run shown in A) ............... 11,937 12,429 


I certify that the statements made by me above are correct and complete. 
Bernard G. King, Editor 
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